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^60)6OT6II60)[JlL(Lb SufieU CT6OT.S(g)Lb ^(^^[TgrT S6II60)6ULJUn:ij.S(g)Lb u^Seugu 

ijlg.?s^60)6OTa6rr Gl^nLca-^liLigj. S06O)6ii4(g)Lb ^^aiDneOT 

ScBijii ^^eurr^gj, LDcreoeu S^gaaeffl^ 2 i_^ulij1^^14 s^lld Off^eugj CT6OTgu u^Seugu 
qanijaerr CTeOTiJ’gj 60)6iiaaLJUL_i_6OT. (J^6 oti_i Seueoeu unrr^^ (glgu6ii6OT[aa6ffl^ 

a^^^giDn:® Seueoeu unij^gju u^.£Ilu CT6OT4(g), ^irnGlgeOTgu ^([5 

@^6un:0 (glgU6ii6OT^^^ S6ii60)6u un:ij.sff>^ Gl0n:i_[a.^Lu^n:^ 6 j^ul_i_ i!jlgffff60)6OTSLU 
@§J- @^^60)aLU Ueorfrlff^l^^ CT6OT.S(g) Sun:ff>6iil^60)6U. &T6 otS6II, 

s=L_Oi_6OTgu ^^5 ^[lerr SeueoeueoLU 6ijlL_(S\6iilL_Si_6OT. "^LD.S(g)^^n:6OT 10 eui^Lii 
'Software Testing' §j60)jdlij 1^ ^guueuLD agrrerrS^" giguib (£160)6 otui!j1^, i_i§j S 6 ii 60)6 u 

.£l60)l_.S(g)li) (J^6 otSu, ©^5a(g)Lb S6II60)6U60)LU 6iilL_(5i6iilL-Sl_6OT. ^gOTfl^ 6T6OTgU60)l_LU 

10 6ii(_[5i_ ^guueiiSLD 6 T 6 OT.S(g) lBIsbu Gluffliu i!jlgffff60)6OTLun:a ^eoLD^gjgfilLLgj. 
gT[a(g) S[B*rjan:60CTg^.S(g).? OffgOTJDng^Lb, ©gneugrreii ^guueiiLb GlaneoOTL CBuij 

gT[aa(^a(g) Testing-4(g)^ S^60)6 iilij1^60)6u greOTgu ^gULiOleiilLLeOTij. 

u^.sa.LJUi_n:^ gT6OTa(g), SgueoeuLijl^eunLD^ 
ai^6OTLDn:6OT arreuaLLLDrraSeii gT[a(g) Oa^eOTJnng^Lb O^rrLijff^lLurra 

(glgnaffl-sau ul_Si_6ot. Testing GlffiuLU (g)60)JD^^ ^guueiiLb OangoijTLeiiijaSgn: 
Sungjii gTgOTJDrrijagrr. 6T6 otS6ii Testing 6T6OTgu OffneOTgOTn:^, gT6OTgu60)i_LU 

^gUU6II^^^(g) S6II60)6U -^leOLUUgJ ai^gOTLD 6T6OTgU S^ngOTJDSgU, SgUgU §J60)JDa6fii^ 
gT6OTgU60)l_LU ^JD6OTa60)6TT gUgnij^gJ-S GiangrrgTT (J^l^-gll Gi<?LLjS^gOT. 


©gjgugog Gi^n:i_ij.?#lLun:a [BfigOT a^gu-sGiangocrSL gu^^ GNU/Linux, Mysql, 
html, css, javascript, Python SungOTJDgogii gTgOTa(g) go)aGff>n:(S\aa^ Gi^n:i_[a.^gOT. 
^gogiiaSgrrn:© Bigdata, ELK, hadoop, pig, hive, spark Sun:g(jrJDgii^go)JDLL(Lb 

Gi^n:i_[a.^SgOTgOT. ^sfrliLina gr^^ ^([5 course-ii SffggQl^gogu. 

course [BL^gjugiiijagrr Gl^ngoaLuna LungogOT gQlgogu, (g^gog 

gClgogu SaLLnijagh. gig^Sgu effLiy^ G^nLa-^SgOTgOT. 

a^JDgu^goJD G^n:i_ij^§j ulijI^^I Os^LugjLb, ^gogii u^jSI agorfrlLuib iBlgOTsfii^^^ 
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gT^ijGiangrrgrr^ G^nLa-^SgOTgOT. 
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Kaniyam.com Bigdata Machine learning u^jSIlu aL_(S\60)[jff>60)6n: 

@^§J60 )jdlij 1^ Sa^g 6ijl([5LJU(j^6fr6TT ueuij u^Seugu 
Saerreijlaeogn CT(I^lji!j1lu eueocreocrLb ^euijagh ^eoeOTeui^ib, "@0 §j6O)jdlij1^ 

Sffg S6ii60OT(S\GlLD6OTJDn:^, CTeOTGleOTeOTOeu^eumi Seueocr^ii? CT[a(g) 

S6ii60(5T(5iii? uriL^ ^LLraaeoena (j^iy-sa CTeiieueneii ^ngrr ^(giii? 

a60?M«frl^ gjeoJDaSa i_i^lu ^ugn^ ©0^(g)6rr S^giyiuna gjeo^LU (j^iyLL(LDn:?" 
CT6OTGljD^6un:Lb SaL_i_n:^, ^60)6ii ^60)6OT^^^(g)LDn:6OT u^60)6U ScBijiyLuna Os^rr^djleijlL 
(j^iyLurrgj. ^6OTn:^ CT^Seurrgrrg^Lb §j60)JDff>(^a(g)6rr gjeo^LU (j^iyiLiii 

CT6OTuS0 ueui^ii bigdata, data science, machine learning, deep learning, AI 
SuneOTJD §j60)JDff>(^4(g)6rr Sa^g 6iil^5uuuu(5l6ii§j eugSeu^a^ ^6 otSjd. ^syrn:^ 
^^^Gta56OT ^gneii a5^JD60)6ULL(Lb, S[Brr^60)^iL(ii Oa^eueiilL S6ii6DOT(5lii. 

uiySiJj uiijl^^l CT(S\uu^6(jr Qy)6U(j^Lb, u^Seugu (g)(i^.sff>6frl^ 

@60)6D0T^§j u[aa56frlLJU^6OT Qy)6U(j^Lb Gtarrgrrgrreumi. 


CT(S\^§j.san:L_i_n:a CBmi eiilLDnsfrlLun:.^ eijlLDneOTii ^ll eiJli^^LDilileOTn:^, 

ScBijiyLurra ^^5 uujI^^I eoLDLUib OffeOTgu CTQ^gjeiJlL© eijlLDneOTii 6 iili_ 

(j^iyLurrgj. ihl^eiieoOTiy, ©^5 a^-sag eunaeOTib, CBneOTigi ffaag eurraeOTii 

Sun:6OTJD6ii^60)JDGlLU^6un:Lb ^Liyu ©Lu^^graaeoerru u^jSIlu ^lyuueoL 
^j5l6iil60)6OT eugnij^gj-s Oarrerrerr SeiigDoriSlLb. OlgOTgOTij ©lu^i!j 1 lu^, eunsfrlLU^, 
an^jSlLU-saBgiilLU^ Sun:6OTJD6ii^60)JDLJ u^jSI <#IjSI§j Oarrerrerr S6ii60OT(S\Lb. 

ijl 6 OT 6 OTSg eiJlLDrrgOTLb ^l_i_.s Oangfreu^^^ rfraaerr ^Lunij 

SungOTgu © 0 ^ 6 O)ff>LU gjeoJDagfil^ Sffg 6ijl([5Lbi_16ii§j greOTU^jii, eiilLDngOTLb ^l_( 516 ii 60 )^lj 
S ungOTJDS^. © 6 ii^jSI^ gjeo^LU 6 ijl(_[ 5 Lbi_iS 6 iin:(_[ 5 a(g) u^Seugu (j^ 6 ot (£lu^^60)6OTa56rr 
26rr6TT6OT. ^60)611 fllgOTgUl^^LDflgU ^ 



1. GNU/Linux: 


djleOTaem SeueDOTiJlLb. ©gjSeu uJDUu^^(g) ^i^uueoL. 

^^gU60)i_LU command line, vim Sua6OTJD6ii^jSl6OT iJ’gj 2[aa(^.S(g) eijli^uuLb 
6j^ui_S6ii60OT(S\Lb. ^aSeu windows-g (j^(i^6ii§jLDaa LOJD^gjeiilL© dileOT-sertil^ euai^^ 
0^ai_[a(g)[aa6rr. Windows- 


^aSeu §j6ii[a(g)[aa6rr 


http://www.linuxtraining.co.uk/download/new linux course modules.pdf 


http ://f reetamilebooks .com/ ebooks/learn-gnulinux-in-tamil-part 1/ 


http://freetamilebooks.com/ebooks/learn-gnulinux-in-tamil-part2/ 


2. Networking basics; 


6ii60)6uui!jl6OT6OT^a6rr u^jSIlu ^g!.LJU60)La6rr, IP Routing, Firewall, DNS, 

VPN Sua6OTJD6ii^60 )JDLJ u^jSI ^Ij 51 §j Glaaerri^aaerr. 





http://www.linuxhomenetworking.com/ 


3. Programming: 


"(glrreurraaLb CT6OTa(g) eurrrrgj" CTgurb CTecOTeocTii Python-dHi^^gj 

§j6ii[5J(g)[aa6rr. ihlaeiiLb CTeffluj Glmn:^. aaaeh ulu^^60)6ot aeuuLDrra 
[fa^6ijl(S\Lb. [faaerr u^Seugu Lomij 6iil^60)^a60)6rT GlffLLJ6ii^^(g) ©gjSeu a^eiiii. 


https://pymbook.readthedocs.io/en/latest/ 


4. SQL/NoSQL/JSON 


ljl6OT6II^5Lb aL_L60)LDLJI_l 6iil6OT6II^ GlLDn:^a60)6TTU(SQL) 60)6II^§ja 

Glarrehi^Kjaerr. 


MySQL 


NoSQL with Redis,MongoDB 


http ://f reetamilebooks .com/ ebooks/learn-mysql-in-tamil/ 





http://freetamilebooks.coni/ebooks/learn-mysql-in-tamil-part-2/ 


5. Visualizations 


Gi<sa6rr6ii^^(g) ej^jeuaem: 6ii60)gui_[aa60)6TT 
2([56iiaa(g)6ii§j u^jSI^ 60)6ii^§j.s Oaaerri^raaerr. Matplotlib 

LD^auii) Kibana ^-^Lueoeu aerrerreOT. 


6. Cloud services 


Bigdata ai^efJlagrr ^60)6OT^60)^LL(Lb, [bld§j Loiyaaeorfrlsfrl Qy)6uSLD Glaaerrerr 

(j^iyLLiii. (g)60)JD^^UL_s=Lb 4GB 8GB eueogiijleuaeOT RAM S^eoeu. 

SiDg^ii AWS, Digital ocean, Google cloud platform, Azure ^.^1 lu 6 ii^ 60 )jdlj u^jSI^ 
60)6 ii^§j 4 Gaagrri^aagrr. ej^aeuG^ai^ cloud service provider- 

aaiaya Gaaerreugj ©eAgyib -^IJDUuaa ^60)LDLL(Lb. <#IjSI§j Offeueii 

^eOTag^ii ug6iiaLijl^60)6u. iq^LU virtual ai^eijlaeoeTT si^eua-s.^, TB ^saeiileuaeOT 
^g6iia60)6TT aeuuLoaa process GffLULU cloud services a^eiiii. 


7. The Big Data tools 


Hadoop, Spark, Pig, Hive, Scikit Learn, Tensor Flow SuaeOTJDeu^eoJDU 
u^jSlGiLi^euaLb ^IjSlgj a^aya Gaaeai^raaga. @ 60 ) 6 iiSlu [Barb efjli^Lbiqii 

26 Ua^§J.S(g) CBlieOLD Os^^g^LD. 



http://freetamilebooks.coni/ebooks/learn-bigdata-in-tamil/ 


http ://tutorialspoint. com/ 


https://www.kaggle.com/ 


8. Maths/Algorithms 


uerr^frl, a^g^iJlujlSeuSiu aeooia^ eiilLLgj &T6OTgu CBarb 

a60CTa(g) [BiieoLD 6iil(S\6ii^^60)6u. i_i6rr6frlLijlLU60)6uu u^j51 ©eOTguii G1^6frl6iiaaa 
Giaaerri^iaaeri. ©gjSeu algorithms CTeiieuagu 2 ([ 56 iiaaauu(S\'^ 6 OTJD 6 OT 
CT 6 OTU 60 )^u GiaaeriOT: a^eiiii. 


9. Community Contributions 


LD^JD6iiija(^4(g) ^([5 6ijl6)^LU^60)^LJ u^jSI-s Oaa(S\a(g)LbSua§j^a6OT, CBaib 

^ 60 )^LJ u^jSI ©eOTguii Giaagrr-^SjuaLb. ct6otS6ii cfiaagfr 

Gifta 60 OTi_ 60 )^LJ u^jSI ^6OT(j^Lb ^([5 u^6ii GleugffliijllBtaaerr. ^(^-^g^grrgrr 

(g)(i^fta 6 frl^ ©60)60CT^§j, Ss^ij^gj a^gU'*Gl'*n: 6 rr(^[aa 6 rr. Meetup.com 
2^61(10. 





@ 60 ) 6 iiGlLU^ 6 un:Lb un:ijLJU^^(g) LD 60 ) 6 uuu[ra ^eiiGleurreOTJnrra GlffiuLU^ 

Gl^n:i_[a(g)[aa6rr. cfraaerr 6i]l([5Lbi_iLb aLug^^^(g) araaeroerr 


@60)^ iJ’eoOT^ii li’eoOT^ii [§160)6OT6 ii Glarrerri^raaerr. 
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Machine Learning - Online Free Course - Andrew NG 


https://www.coursera.org/learn/machine-learning 


@60)6DOTLU eii^u unLii), LBlaeiiii ULUguerrengj. ^ 6 iijd eiilLn^gagfr. 
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CT6(jru§j eugrrij^gj 6II([5.^6 otjd ^([5 gjeoJD. 

^([5 a6afTl«frla(g) a^Qluugj, ^j^leii i_iaL_(B6ii§j, i_iaL_i_uuL_i_ ^iSleiilgOT 

^i^lju 60 )i_li!j 1 ^ a 6 ®frl«frla 60 ) 6 TTSLU (j^i^ 6 ijl 60 ) 6 OT SLD^Glan:6rr(^LDn:gu Os^Lueiigj SungOTJD 
u^Seugu 6ijl6)^LU[aa60)6TT ©Lu^^g6ii^.s arreocreumi). LD«frl^6OT OffLU-^gOTJD 

S 6 ii 60 ) 6 U 60 )LU Glgiigyii (Sl[j^a>6rr gT(i^^ aeorfrlsfrleoLUff GlffLULueoeiiLJU^gOT OuLuij 
©iLi^^geiii^a abpp^ OuLuij ^[rsfrlLU-saib (Automation). 

LD«fii^60)6OTLJ SuagOTgU a6Drfrl«fiia60)6TT SlLia^laa 60)6II^§J, (J^l^6Iia60)6mL(Lb ^^60)6OT 
60)6ii^S^ 60)6iiLJU§j, ^eiieiiagu grC^-s-SLiuC^ii (j^i^-giiaerr ^LU^^g^^gOTinaa 

^^eUULD^ ^jSleiilgOT ^l^LIUeOLlijl^ ^60)LD6II^^(g) gT6OTGl6OT6OT6OT GlS^LULU S6II6DOT(5tli, 
^eiieuagu Siua^laa eoeiiLJUgj greiieuagu ffu^^LuuuLLgj, ^^g^grrerr gii^(j^go)JDagfr 
gTgOTgOT, SaaL_ua(S\agrr gigAGlgOTgAgOT grgAugj SuagAjD ^go)gOT^go)^LL(ii) gfllgrraigigiiS^ 
©iLi^^ggiii^a ab^nj^ 


©gu^goJDGiLu^guaLb GlffLLJgii^^(g) Glguguii ^agu^ ^j51Sgiia(S\ 

LDL_(5tLD^6UaLD^, agorfrl^LD, l^gUgfitlijlLU^ SuUgOTJD LD^JD gJgOTJDagfiig^Lb .^IjSlgJ ^l^UUgOTL 
^jSlgiilgOTgOT gugrrij^gja Glaagrrgn: SgiigO(jT(S\ii. ^uSuagj^agOT CBiii-Da^ aguuLoaa 
ag®fTl«frla(g)a Giaa(S\aa (^i^LLiii. SiDg^ii) maguii ^L^tSleoguaig^a^LD, 

^ggiia(g5a(g)ii) gj^u ^ld§j Saa^gogm: (j^i^giiago)gTTLL|Lb, agorfriLJi_iago)grTLL(Lb ldu^jSI 
gii^[5j(g)giiS0 ©LU^^ggu^a a^JD^JlgOT <#Ijdui_ilj ugoOTi_i ©gw^SiLi Adaptivity 

gTgOTgu a!_gu6iirj. gr^Gl^^^ gugoaiLiagOT ^L^(glgo)guagrr ©LU^^ggu^a a^JDg^a(g) 
gu^ gii(g)a.^lgOTJDgOT gTgOTUgo^a aagoOTguaib. 


LDsirt^gUgoLLU ^guugu ^jSlgua^ OaLULuaaLi^LU OaiLi^agrr: giiaagOTii ^l_(S\6ii§j, 
^(^gui^gOLLu (giggogua SaL_Si_ ^gogrr agKfrluugj SuagOTJDgogiiOLU^guaLb ^(^guij 
^gOTgijgOTLLU ^guugu ^iSlguag^ii), 1-1^^ aagjijLu^^ag^Lb GlaiLiLua a!_i^Lugo)gii. 



ScBiji^-Lurra (glg^aerr CT(I^^ a56®ffl«frla(g) (j^i^Lurrgj. 

^gUU6ii^60)^LL(Lb ^jSl6ro6iiLL(Lb Glan:(S\^§J^^n:6OT CBmi aeorfflsfrleouju 
S6II60(jT(Bli>. SlDg^li) CT^§J60)JD ff[rij^^ 6ijl6]^LU[5Ja60)6TT CBmi a6®ffl«frla(g)LJ l_iaL_l_ 

6iil([5Lbi_l.^SjDn:SLDn:, ^0 §j6O)jd eu^g^^ijaerr Qy)6ULb a6®ffl«frla(g)LJ Surr^LU 

^jSl 6 iil 60 ) 6 OT 6ii^[aa S6ii60(jT(Bii. @ 6 O) 0 Slu domain expertise CT6OTgu an_gu6iiij. 


LDerfii^ ffft^60)LU lS’jSI Off lulu S6ii60otl^lu Offiu^fferr: eiileoOTOeiieffl, y,Sffa6TTii, 
^jSleiilLu^ SuaeOTJD ^60)6 ot^§j^ gjeoJDffefrig^ii u^Seugu Sffa^eroeOTfferr 
[BL^^LJU(5i'^6OTJD6OT. ^60)611 O6Il^jSl60)LU^ ^(Lg6II S6II60OT(5iOLD6OTJDa^, 6J^Off6OTS6II 
S^a^eiileoLU^ ^(j^eijliu (j^^ero^iu Sffa^60)6OT (j^L^eiiffeoerr CT(5i^§J ^ijaiu^gj 
06Il^j51ffffa6OT ff6®MLJl!jl60)6OTff ff2_JD S6II60OT(5ili. 2^ag60CT^§Jff (ff) LD([5^§J6II^ 
§j60)JD6roLU CT^^gJ-sOffaecOTua^, i!jlgff6ii^^6OTSua§j OueoOTffefiieOT @a)UL4 
CT6OTU§j ua^ff(ff)U ua^Luaff (ff)60)JDLJU^^(ff) ©gjeueog 

Ljlgff6ii^^6OTSua§j ueuSffaL^ffffeocTffffasa OueoOTffefrleOT 

^LLJ6iijSlff60)ffff60)6TT CT©-*-* S6ii60OT(5iii. ^6ii^au6rr ^eiiOeuai^ OueoOT^ii CT^saa^ 
@iD^^(i5'S<£lJDa6rr, ct^^60)6ot Oueocrfferr eueoffiuasa ffageocr^^a^ 
©JDff.£ljDarrff6rr, ct^^60)6ot eueroffiuaeOT ffageocTLafferr ©JDLJL4ff(ff) 6ii^6ii(ff)ff.^l6OTJD6OT 
CT6OTU§j SuaeOTJD efSle^^iuLaffeoerrOLU^euaii) ffeoOT© fllL^ffff S6ii60OT(5iii>. ©O^^euaii 
a60OT60)LDLljlS6uSLU LDSfil^ ffff^ff(ff) ^UUa^ULL OffLU^^a6OT. ^ffSeU ©611^60)JQ-ff 
OffLLJ6ii^^(ff) ff 6 ®frl«frlff 60 ) 6 rrLJ u^ff.^, LDa^ffliuasa pattern-^ 

^geiifferoerrff ff60OT(5iO^!-'S<^l6OTJD6OTij. i!]l6OT6OTij ^6 ii^60)jd CT(5 i^§j ld^5^§j6ii 
eu^g^LBLifferr uffl#djl^§j "©JDUL 4 ff(ff) 6ii^6ii(ff)ff(ff)ii) ffageorfrlfferr" ct6ot (Lpi^-eii 
OffLLj.^l6OTJD6OTir. flleOTsaij ^L^LJU60)LLi!jl^^a6OT ^^Suagj ffijflleorfrlLuaff 

6ii([5.^l6OTJD OueoOTffefrlLii) ©6 ii^j 51^ ejS^gUii ^6OTau O^eOTULua^ SklL suSsa 
^au60)6ii .#l.^ff60)ff Offiugj 6iil(5i'^l6OTJD«aij. ^ffSeu^aeOT ^^Suagj ua^ff(ff)LJ ua^ 
Ou60OTff(g5ff(ff) ^aU60)6II .#l.^ff60)ff OfflULUUULLag^li, @JDUL1 CT6OTU§J 

(J^(l^6II§JLDaffff (ff)60)JD^§J eiilLLgJ. 


@Lu^^g[aff(g5ff(ff)ff ff^flluugj CT6OTU§j "6ijl6U[a(ff)ff6rr CTeiieuaau ff^.^6OTJD«a" 
CT6OTU6O)0 ^L^uu60)LLuaff 60)6110®^ ^gaiuuuLLgj. i!jl6OT6ii([5ii) Oui^ii 

SaaLuaC^-aerr (j^ff-^LUu u[a(ff) 6ii.^ff.^l6OTJD«a. 


Bait's shyness: ^lieoiD [aijff(ff)LDaau a6®a6i]l60)«aff ffeoOT© ^@ff^^ 



Glurri^err. ^^n:6ii§j ^i_6iiuul_i_ ^6OT[r^ urro'uu^^^a 

a 6 Iigaa 2 _l^LU 61l6roaujl^ ©^5a(g)Lb Si6ro6IIU4,L_l_LJUL_l_ a60OT6Iia6ro6TTa aeoOT© CTdJlaerr 
Glarrerri^LD. ct6otS6ii ^6ii^60)jd (j^(i^^[ra aL_Glan:6rr6ii^^(g) (j^eOTsyrij, 
S60OT6i]l6OT ^([5 ^IgU U(g)^60)LU &T(S\^§J ^^ 6 OT[r^ ^ 6 OTa(g) 

Lun:Gl^[r ([5 u[r^LJi_iLb ©^60)6uGlLU«frl^ 260OT60OT6un:Lb CT 6 OTguii, u^lji_i aeoOTeocT 

SeueocrLmi CT 6 OTguii (^^!. 06 ii(B'S(g)Lb. OleOTeOTij iJ’eoOT^ii Surrey jd ^([5 

S60OT6i]l60)6OT LDgU(J^60)JD ang^lLPSurTgl. ^n:6OT 6J^a6OTS6II Sffn:^60)6OT 

(j^i^6iia6ffl6OT ^i^lju60)i_li!j1^ 260(jT60CT6un:LD[r S6ii60OTi_n:LDn: ct6ot (^i^Gl6ii(S\'S(g)Lb. ©gjSeu 
a^JD^Jlg^LD ^60)i_GlugU'^iD§J. 


LDflGlUl^LD ^g6Iia6ffld)l([5^§J ^(_[5 #lgUU(g)^60)LU <a6®ffl6®fflLUn:6OT§J 

^grriLiLD. ©gjSeu sampling CT6OTUu(5lii. U(g)^ ^geiiaerr training data 

CT6OTgu ^60)^aaLJu(5iii. ^^^[J6iia60)6rr s60(jT60OT6uaLb SeueoOTLaib CT6OTU§j SuaeOTgu 
6ii60)aLJu(5i^§J6iiS^ labeling CT6OTUu(5iii. @10(^^19.611^60)611 60)6ii^§j 6ii(^.^6orJD i_i^lu 
^ g6iia60)6rra a6orfflLJU§j Predicting the future data 6T6orLJu(5iu). @§jSua6orgu 
u^S6iigu U0[aa6rr @Lu^^g6iiL^a ULU6oru(5i^^uu(5i'^6orJD6or. ^6ora^ 

@§jSua6orJD ^gLDa6or[aa6rr #l6UffLDLULb ^6iiJDaa LDajSl6ijli_6iiLb 6iiaLULJi_i6rr6TT§j. @^^(g) 
a^ag600TLDaa i!jl6or6ii(^Lb SaaL_uaL_60)i_a an_JD6uaLb. 


Pigeon's superstition: i_iJDaaa6fii6or 06iiJDa6or a6®fflLJi_i 6r6orgu CBULb @60)^a ai.JD6uaLb. 
^(^(j^60)JD B.F.Skinner 6rguLb LDS6ora 0^§j6ii6i]lLU6ua6TTir i_iJDaaa60)6TT 60)6ii^§j ^llj6ii 

^60760)JD [BI_0^6Oraij. U6U l_lJDaaa60)6TT ^60rJDaa (3n_600Tl9^(g)6rr 60)6II^§J , 

^60)6Iia(^a(g) (gijSlufllLL aa6U @60)l_Gl6II6fiia6fii^ ^6007611 Glff607gU Sff(^l_DagU 
^asfTlLUra.^ ^60760)JD ^60)77)0^^71. ^gJOIlLD ffiJllLlUa GlffLU^UL© 

(J^60)JDLLlLb a6OO76II6f7l00J 6II^00J. l_lJDa4a677 06074(0 ^6ijGl6Iia(0(^6O)JDLLlLb ^6007611 
67LJU19 6II(041 jD 0J 676O7U6O)04 ^6007(510119-4^ ^^S[B[J0^4 ^0J 67607607 
(^ffLLJ0j(^aa6o4l9(0^00J 676O7U6O)0 a6II«f7l4a0 (^0ai_[5j4lLU0J. ^0a6II0J ^(0 l_lJDa 
0a6OT 06O)6ULU6O)ff4(04Sua(^046Ua4 a60076Il 611(06110^4, 0a6OT 
06O)6ULU6O)ffUU0a40a6OT 06074(0 S60076I1 6II04lJD(^06O7gu4, l_D^(^JDa0 l_lJDa ^0J 
(0^00j4(^aa6oo7i904(04Sua(^046ua4 a60076ii 6ii06ii0a4 ^06O7a40a6OT a60076ii 
6II041jD(^06O7gu4 (gl6O)6O700j4 (^aa6OO7l_0J.. 


^607a4 @60)6iiujlg6007(5l4 0^(^ffLU6uaa0 (^0ai_ijua6O7 ^ootSjd ! (temporal 



correlation). seocreoLDiijl^ unij^^n:^ ffLbLD^^(^Lb -^eoLLungj. 

i_lJDn:S 6 ii[r ©6ij6i]lg60CT(S\'*(g)ii> ^eiiJDneOT ^([5 Gl^n:i_iji!jl 6 ro 6 OT seocTLna.^, 
^^6OTI^LJU60)I_Uj 1^ a6®fflLJl!jl60)6OT 6i]l(S\'^tD§J • ^n:«fflLU[5J.^l 

GlffLu^u(5lii> ScBijii LDn^JDUUL© s60OT6ii 6iig^ Gl^n:i_[a.£lLU§j. ©eo^LUjSlLunu 

i_lJD[raa6rr ^60)6ULU60)ff^§ju unij^gjii, uno'^gjii aeocreii eugn^^n:^, 

CT60)I_ (g)60)JDLU^ Gl^nLKJ-^LUgJ. S60OT6I1 6II([5.^6OTJD ScBtF^eO)^ ffiJllLinaa 

angeocTii. ©eiieunguii CBLOgJ 

[Bi_^§j6iili_a<gn_i_n:§j. ^^Glffiueuna [B60)i_Gluguu3 0^ai_iri_i6roi_LU tglai^eiiaeffleOT 

^gjSeu cbld^j [BLDa(g) ^6ffla(g)Lb 

aeorfrlLJuna <gn_i_n:§j. 


CT^jliOgueoLLU s^rrgeocT^ero^SiLi L^ecOTlBii unir^^n:^ ffn:LJi!jlL_i_6iii_6OT, ^([5 
LBleOTffnga aLbfllii!]!^ ^i^ul© un:^LJU 60 )i_.^JDGl^«ffl^, ©eiieijlgeocTlB'ScgiLb ^([5 
Gl0n:i_rri_iLb ©^eroeuGliLieOTUgj CTd)la(g)^ Gl^iJliLiLb. ©gjSuneOTgu ejdjlujlLLb 
an: 60 OTLJU( 5 l'^ 6 OTJD ^([5 ^l^UU 60 )l_LUn: 6 OT ^jS160)6iiSlu CBmi a6®ffl«ffla(g)LJ i_iaL_i_ 
S6ii60(jT(5lii>. @§jS6ii Inductive bios CT6OTgu ^ero^aauuQ'^tDgJ. Biosed CT6OTJDn:^ 
unguLffii) unijuugj, ^([5 ^eroeuuuLffLDua ©(^^uugj CT6OTgu Gluni^err. Inductive bios 
CT6OTJDn:^ ^6OTLDn:6OT (j^i^eiiaeroerr ^uui^Slu ej^gu-sOanerrermiD^ ^jSleiileOT 

^i^lju60)i_u!j 1^ unguLffu uniruugj CT6OTgu Gluni^err. ©gjSuneOTJD ^jSleiileroeOT 

a6®fTl«fria(g) ^6fiiLJU^^(g) ^^gjeroJD eu^g^cBrraerr S^eroeu. ^eiiijaSen: 

domain expert CT 6 OTgu ^60)^aaLJu(5i<^6OTJD6OTij. 


6ii^aa^JD60)6ULJ i!jl6OT6ii([5Lb 4 iljliJlaaeuaLD. 


1. Supervised vs Unsupervised Learning: ^^5 a 6 ®fflLJi_i [B 60 )i_Glugu 6 ii^^(g) aerrerf© 
CT6OT6OT6Iin:a S6II60OT(5ili; GleueffllU© CT6OT6OT6Iin:a S6II60OT(S\li>; 

@6ij6iilg60OT60)i_LL(Lb eiil^aefiieOTUi^ ©eoeocraa S6ii60OT(5iii> SuneOTJD 

^ 60 ) 6 OT^ 60 )^LL(Lb aeocTaneKffluugj supervised/structured 

learning CT6OTLJu(S\ii. a^ag 60 OT^§ja(g) CBmi a 60 )i_ 6 ]J^a(g)ff Glff6OTgu 
Gl6ii60OT60)i_aan:LLj 6iin:[a(g)6ii^^(g) (^6 ot, (glJD^6O)0LJ un:ij0§j ^eiieunSjD 

^i^gj«fri60)LU Seuffnaa .^errefiiLJ unijuSuaLD. uaeoLDLuaa, LBli^gjeuaa 

6iia[aa6uaGlLD6OTauLb, ai^eOTLoaa U(i^LJuaa S6ii60OTi_aGlLD6OTauLb 

(j^i^6ii GlffLuSeuaLD. 



• ^([5 Gl6ii60OT60)i_aan:60)LU 6iin:[5ja6un:LDn: S6ii60OTi_n:LDn: &t6ot (Lpi^-eii GlffiuiLiLb 

an:g6®ffla6TTn:6OT (gl/Dib ^ 6 OT 60 )ld Surrey jd 60 ) 6 ii domain set CT6OTUu(5iii. 

@6OT6iiSlu X CTguii aerrerfLi^^ aaeocruuQii. 

• euaraaeuaii), SeueoOTLaib CTguii LD^LJi_ia6rr labels © 60 ) 6 iiSlu 

Y CTguii OeiiefrlLiJLi^^ ^60)LDLL(Lb. 

• ^([5 mapping function -^ 6 ot§j aerrerfLi^^ aerrerr LD^LJi_ia 60 ) 6 rrLL(Lb, 
GleuefilLilLi^^ serrerr LD^LJi_ia60)6TTLL(Lb Gl^ai_iji_l GlffiuiLiLb. G1 ld6OT60)ld -> 
euaraaeuaib, ai^eOTii -> SeueocrLULb, uaeoLO -> eua^jaeuaii, U(I^lji_i -> 
SeueocTLaLD. ©gjSeu rules set CT6OTUu(S\ii>. f: X -> Y 

• Rules set eiil^aefrleOT ^i^lju60)i_lij 1^ a^auaGlaaerreugj learner 

CT6OTUU(5lli>. 

• Learner a^au<sGlaa 60 OTi_ eiile^^LUKjaefrleOT ^i^lju60)i_u!j 1^ i_l^^aa 6 II([ 5 .^ 6 otjd 

aerrerf©'*!®'*® OeuefrlLij© CT6OT6OT6iiaa ct6ot (rpi^-eii Glffiueiigj 

predictor CT6OTLJu(5lii. ^^aeugj i_i^^aa LO^GljDai^ Gl 6 ii 60 OT 60 )i_aaa 60 )LULJ 
uarra(g)U)Sua§j, ^^60)6 ot iJ’eoOTQii Sffa^60)6OT GlffiuLU^ S^60)6 iili!j1^60)6u. 
Sffa^60)6OT (j^i^6iia6frl6 OT ^i^lju60)i_u!j1S6uSlu euaraaeuaib, SeuecOTLaib ct6ot 
( j^i^Gl6ii(5l'S<s6uaLb. 


@^60)6OT classification LD^auii regression CT 6 OTau 2 efil^maaij fllijlaaeuaib. 
Classification-^ ld^ui_ 1 6jS^a ^([5 eueroaujleOT ^601011410. Gl6ii60OT6roi_aaaLLJ 

euaraaeuau) SeueGOTLaib CTguii ©geoOT© 6ii60)au!jl6OT ^60 )ld6ii60)^ 
a^ageocTLoaaa GlaaerrerreuaLb. Regression-6OT ld^lji_i ^(45 s60OT60)LDLua6OT 
(J4)(i^ LD^uuaa ©(45a(g)Lb. euulil^jSlg^errerr (gi^^eo^eoLU scan GlffLugj ^gaiu^gj 

LjljDa(g)U)Sua§J CT6ij6II6TT6Il CT60)I_ CT6OTU60)^ kg-^ 

<gn_au 6 ii 60 )^ a^ageoOTLoaaa GlaaerrerreuaLD. 


Unsupervised Learning - ^ Gleuauu) a6rr6TfL_(5l'a'an:6OT LD^LJi_ia6rr ldl_(S\®I-D 
aa60CTLJu(5lii. Gi6ii6frliUL_(S\'a'*n:6OT ld^ui_i CT6OT6OT6iiaa ©(45a(g)GlLD6OTSjDa, 



CTeiieiil^aeffleOTUi^ ^ 6 roLDLL(GlLD 6 OTSjDn: 6ii60)g(j^60)JDiL(Lb .^ 6 roi_Lun:§j. Gleuguii 

s6rr6TfL_(S\'*'Sn:6OT LD^LJi_ia60)6TT LDL_(S\Si-D ^[jrTLU^gj, ^([5 pattcm-ga 

a60OT(S\fl^!-^§J ^^60)6OT Gl6II6fflLl5L_(5\'S<Sa6OT a6®ffluuaa [BLD.S(g) Gl6II6fflLJU(B^§Jli>. 
@^ 60 ) 6 OT clustering association CTguii) ©geocr© eijl^LDnau fllfflaaeuaii). 

6iin:i^a6roaLun:6TTija6frl6OT ej^JDnijSun:^ 6ijl^u60)6OTLun:.^6OTJD 

Glun:([ 5 L_a 60 ) 6 TTa 6 ii 60 )auu(S\^§J 6 ii 6 ro^ clustering-a(g) a^ageocTLDuaa 

Glan:6rr6TT6un:Lb. 6ijl^u60)6OTLun:.^l6OTJD ^geiiaeoen ldl_(S\®I-D aerrerfLua 
CT(S\^§J'*Gl'*n:60OT(S\, Suna-^SeuSiLi Glff 6 OTgu eiil^ueroeOTUjleOT Suna-^eroeOTa 

(sales pattern) ©eiieungu a 60 OT(S\i!jli^aaLJUL_i_ 

6ijl6iig[aa60)6TT CT(S\^§J'*Gl'*n:60OT(5l, LDU^fflLuaeOT Seugu 
Glun:([ 5 L_a 6 frl 6 OT L5G1^^6un:Lb 6iin:i^a60)aLun:6TTija(^a(g) eijli^uum G^aeOTguii CTsyra 
aeorfrluuero^ association-a(g) s^ngeocTLonaa GlanerreneuaLb. Qy) 6 ULb 

6iil([5LJU[5Ja6frl6OT ^60)LDLL(Lb U^GeUgU Ouai^LaefileOT 6i]l^U60)6OT60)LU CBaii) 

^^afflaaeunii. ©gjGeu unsupervised/unstructured learning CT 6 OTUu( 5 lii>. 


Structured Unstructured ©60)6ii ©g60OT(5l'*(g)ii> @60 )i_lij 1 ^ ^60)ld6ii§j semi- 

structured learning ct6otlju(51u). ^^rreugj ^(^.^leu ^geiiaerr label GlffLULULJUL_(5lu), 
LD^JD60)6ii label GlffLULUuuLnLDg^Lb an:60OTUu(5lii. s60OT60)ldu!j1^ 

6ii([5.^l6OTJD ^geiiaerr [BLDa(g) ©Lb(j^6O)JDLi!jl^0n:6OT ueuGani^aaeocraaneOT 

^geiiaeroerr ^gniu^gj label Glffiueiigj ct 6 otu§j ff^^LULO^JDgj. ^eiieurrGjD 
^60)6OT^60)^LL(Lb labelGlffLULuniD^ 6iil(5l6ii§jLb a^eungj. (j^a.^LULDn:6OT60)6ii label 
GlffLULUUULLUa G6II60OT(5lli>. @§jGun:6OTgU label GlffLLJLUUUL_(5lli GiffLULUUULULDg^Lb 
@([5a(g)U) ^g6iia60)6TT CBaii) Gm^aeocTL ©geocrlB eiil^raaefrlg^Lb ^gnLueunii). msfrl^/ 
l 61([5 <s (j^araaefileOT a6®frlLJi_i, (gg^aefileOT a6®frlLJi_i Gun:6OTJD60)6ii ©Lb(j^6roJDu!jl^^n:6OT 
^60)LDLL(Lb. Structured (j^ 60 )jdli!j 1^ label GlffLULUuuLLeu^eoJD LDL_(5lii training data- 

Glan:(5l^§J, ^06 oti^uu 6 O)i_u!j 1^ LD^JD60)6iia60)6TTa aeorfrlaaeunii). Unstructured 
(j^60)JDLJUi^ label GlffLULUuuLL GiffLULUuuLn:^ ^([5 

pattern-ga a60OT(5lfllL9.^§j ^60)^ eoeii^gjii) aeorfrlaaeunib. 


2. Passive vs Active Learning: 6 II([ 5 .^ 6 otjd ^geiiaeroerr ^uui^Glu ej^gu-sGlanecOTlSl, 
Glan:(5l'*'auuL_i_ eiil^aeffleOTUi^ ^rraiLi^gj passive learning 

CT6OTUU(5lli>. ^(5 L6l6OT6OT@ff^ Spam-^ ©^60)6ULUa CT6OT Gffn:^LJU6O)0 
a^ngeocTLonaa GlanerreneuaLb. CTeroeuGlLu^eunii) spam-ga (g)jSla(g)Lb 

6iiair^6O)0a6rr ct6otu§j a6®frl6®frla(g)a a^i!jlaauuL_(S\6ijl(Bu3- ct6otG6ii i_i^0n:a 



6 II([ 5 .^l 6 OTJD ^([5 L 6 l 6 OT 6 OT@ff^ ©^^60)aLU 6 Iin:O'^ 6 O) 0 a 6 ffl^ ejS^gULD ^6OT60)JDa 
Glftn:6OOTi^([5^0n:^ ^^60)6 ot spam folder-a(g)Lb, ©^60)6uGlLU«frl^ inbox-a(g)Lb 
niisaflS^ii-D. 


^ir^GlgeOT spam-a(g)fflLU ^([5 6iiair^60)^60)LULL(Lb GlaaecOTi^gaiD^, ^ 6 OTa^ 

LDaJDaa ^([5 L6l6OT6OT@ff^ eui^-^JUGl^sfrl^ 
(anamoly detection), ^6 ot§j ^^gjaGlaaerri^LD Ouai^L© 

u^Seugu Saerreiilaeroerr CT(I^lji!j 1, eiSleroLaeoerTU ULU6OTija6fiii_L6l([5^§j 

Glu^gU-sGiaaecOT© ^^6 oti^lju60)i_li!j 1^ a^JD60)6u^ Gl^m_[a(g)§j active learning 
CT 6 OTUU( 5 ili>. 


3. Adversarial Teacher Method: Spam filtering, malware detection, biometric 
recognition SuneAjDeii^jSlGleu^eunLb, SuneAgu ^(^euij Glffiu^uL© 

Glan:(5l'S<aLJUL_(S\6rr6TT efSl^aerr iTjDuulBiiSunS^n/ff^S^.^laaLJulBiiSunS^n: ct§j 
CTgj ^6iigu CT6OTU60)^ CTQ^gJsorruunir. Gleiiguii ^[jeiiaSenn:© 

@Lb(j^60)JDLijlg^Lb a6®fTl«frla(g)a a^i!jlaauu(5lii>Sun:§j, angeocr anijluj (j^eoJDUui^ 
LjliJl^gj a^gu-sGianeheii^^aneOT 6iin:LLJLJi_i .^l60)i_a.^ljD§j. 


4. Online vs Batch Learning: (glL6li_^^^(g) (gliBlLLb LDn:gU'^6OTJD ^geiiaeroena 
aeocraneo^^gj ^^6 oti^lju60)i_lij 1^ online learning CTeOTeiiii, eugeun^gU^ 

^g6iia60)6TT CTlSl^gJ'SO'SneoOT© ^^6 oti^uu60)i_lij 1^ batch learning CT6OT6iiLb 

^60)^aaLJu(5lii. Stock broker a6®frla(g)Lb online-<S(g) a^ngeocTLonaeiiLb, 

LDaaeh Gl^neoa a6®fTlLJi_i [B60)i_Gluguii batch-a(g) s^ngeocTLonaeiiLb 

Glan:6rr6TT6un:Lb. maaerr O^neroa aeocraGlalSlufll^ 1970 - 80, 1981 - 90, 1991 - 
2000, 2001 - 10 CT6OTUgj SuneAgu u^Seugu U(g)^a6Trn:aLJ flliJlaaLJUL© @>6ijGl6iin:([5 
10 6II([5l_^§ja(g)Lb ^LLJ6I1 [B60)l_GlUgU'^lD§J. @^6OTI^LJU60 )I_Li!j 1^ ©«frl6II([5Lb 
6ii([5i_[aa(^aan:6OT LDaaeh O^neroa a6®frlLJi_i CBeoLOuguii- @§jS6ii batch 
processing-a(g) a^ngeocTLona ^60)LDLL(Lb. 


6ii^aa^JDd)l^ aerrerr u^Seugu San:L_un:(5l<a6rr u^j^Illild ^6ii^jS16otui^ 
^60)10^0 u^Seugu 6ii^(j^60)JDa6rr (algorithmns) u^jSIilild ©sfrleui^Lb 
aneocTeunib. 




Statistical Learning 


i_16rr6ffl 6ijl6ii[j[aa60)6rT4 Glarreocr© ^i^uuewL. 

^^5 a6®fTlLJi_lLb ^geiiaerma ^6ffl4auu(Bii i-ierreffl 6iil6ii[j[aa6ffl6(jr ^i^uu 60 )i_lij 1 S 6 uSlu 
^60)LD^JD§j. ©^06O)aLU i_i6rr6frl 6iil6ii[j[5ja60)6rT^ ^JDLDULa ewaLurreoOT© 
a6®ffl«frla(g)a ©<sn:(Buu§j ctuui^ CT6 OTgu ©uu(g)^Lijl^ arreocreumi. ©gjSeu 

Statistical learning model CT6OTgu ^60)^4aLJu(5iii. 


Domain set: aerretrLcra^ ^(^.^leOTJD i_i6rr6ffl eiSleug^jaSeTT ©eiieurrgu ^60)^4auu(5iii. 

x={.} CT6OTU§j domain set / instance space CT6OTUu(S\ii. ©^g^erren ^eiiGleurri^ 

6ijl6iig(j^Lb domain points / instances CTguii) duLuffl^ ^60)^aauu(S\ii. 


a^n:g60CT^§j4(g) ^([5 1000 uaa (S[BaL_(S\ui- 1 ^ 0 '*^^ 6 OT eiileoeueoLU CTeiieueneii 
eoeiiaaeuaLD &t6ot ^([5 algorithmn Qy)6ULb a6orfrluu^^(g), ©gjeueog CBaib eiileoeu 
(£lij60CTLLjl^§j6rr6TT (S[BaL_(5iui-l^^'*[aa6fii6(jr uaaraaerr a6rr6ifi_n:a 

^6fiiaaUU(S\'^l6OTJD6OT. 


X = [10, 50, 150, .... 600, 800] 


Label set: OeuefilLiJLna 6iig SeueDOTi^LU eiJleiigaaeOTgn: Glu^jSli^aigiLb. Y = 

{.} . 26rr6ifi_n:a ©(^a-^leOTJD ^geiiaefr eueoaiijleOT ^60)LDiL(Lb 

CTguii) LD^LJi_ia6rr aneocruu^ii. 6iil6iig[5ja60)6n:^ a^eiiueuij 

'domain expert' CT6OTgu ^ 60 )^aauu(S\ 6 iin:ij. 





aerrerfLi^^ CBmi ScBnL^ui-l^^'SKjaefrleOT eiJleOTeuagrr @[a(g) aneDOTUuiJlLb. 


Y = [50, 95, 250, .... 750, 999] 


Mapping function: ©g60OT60)i_LL[Lb eweii^gjaGlarreoOT© a 6 rr 6 ifL_(S\'*(g)ii 

Gl 6 ii 6 fiiLUL_(S\a(g)Lb ©60)i_SLULun:6OT Gl0n:i_ij6O)u 6 j^u(S\^§jLb S6 ii60)6U60)lu mapping 
function (f) Glffiu^jngj. ©^60)6 ot ^rreOT [bld§j algorithm [BLb(j^60)LLU 

^g6iia60)6TTLJ u^jS 14 Gi<sn:6rr.^ljD§j. 


f:x ^ y 


f : 10 u-saraaeh -> 50 ^umij; 50 uaaraagh -> 95 ^uniu ; 150 uaaraagh -> 250 

0un:LLJ .... 


Probability Distribution: CBmi GlarrQ'S'^lsjrJD ^geiiaeh ugeueuna ^ 60 )ldlu 

SeueocrlBii. Oeiiguii ^fflijsoOT© ^geiiaew^rr LDL_(5lii Gi<sn:(S\^§J6ijlL-(B «*6®frlLJi_ia6rr 
a^LUgj. a^n:g60CT^§ja(g) 10 uaaib 500 uaaib CTguL6lg60OT(B 

6iil60)6U60)LU LDL_(Bii Gi<sn:(B^§J 6iilL_(B, ^irpGiijeOT 1000 u4a 
6iil60)6U60)LU aeorfilaa.? Glffn:6OT6OTn:^, a6®frlLJi_i ^eiiJDrra^ ^rrem: 

©([5<S(g)Lb. ^g6TT6ii4(g) fffflLuna S6ii60OT(Bi-Dn:6OTn:^, CBnii 

ULU6OTu(B^§Jii ^rreiiaermeOTgj (^60 )jdlij 1^ ugeueurra ^60 )ldlu 

SeueocrlBii. ^^rreiigj 10, 50, 150 .... ct6ot #rrn:6OT (^60 )jdlij 1^ u^Seugu i_i^^a[5ja6frl6OT 


GlarrQ'S'SLJUL (S6ii60OT(S\ii. ©gjSeu 
probability distribution CT6OTLJu(5lii. @^ 6 oti^lju 60 )i_lij 1 ^ CT(5l'*'*uu(5lii 
ffQlLurreOT^rra 


Sample data: rBaii ^guui_iLb ^geiiaSerr sample data 

^^ 6 u§j training data CT 6 OTUu(S\Lb. a^ngeocr^gj-sigi [BLDiBlLLb 500 i_i^^a[aa 6 frl 6 OT 

eiSleoeuai^ii) ^60)6ii ^60)6OT^60)^LL(Lb 

Giftn:(5l^§ju u^aariLD^, 0-50 uaaaagrr OaneoOTL 
^([5 6iil60)6u LD^guii 50 - 100 u-saraagfr GlaneoOTL 

LD^GljDn:([5 eiileoeu &t6otu§j SuneOTgu CBmi ^guui_iLb 

LDcr^iJl^ ^geiiaSen Sample data &T 6 OTUu(S\Lb. 


Learner: CBULb ^giiufllu-ierrCTT Lorr^ffl^ ^geiiaefileOT ^i^lju60)i_lij1^ 

i_l^^a[aa6fil6OT 6 ijl 60 ) 6 U 60 )LU (glij 60 CTLLjluu§j u^jSIlu ^j 5160 ) 6 ii jBLDgj algorithm- 
eu^rrij^gja Glarrerr^JDgj. ^eiieurrgu GtarreoOTL algorithm-^sjrgj learner 

CT6OTgu ^ 60 )^aaLJU(S\^JD§j. (A(S) = Algorithm of sample data) 


A(S) 


Predictor: Learner eueTrij^gja GlarreoOTL ^jSleiileOT Qy)6ULb, label GlaLULUuuLn:^ i_i^lu 
.^ leu i_i6rr6frl efSleugaaerr eui^LoSungj ^eu^eojnGlLu^eumi label-eA 

^ewmaaeumi ct6ot aeorfrluugj Predictor CT6OTUu(S\ii. ©gjGeu hypothesis / classifier 
CTguii u^Geugu GluLuijaefil^ ^ 60 )^aaLJu( 5 l'^ 6 (jrJD 6 OT. (h = hypothesis), ^^rreiigj 
^^auLaiDna 800 uaaKjaerr GlarreoOTL i_i^^a^^6OT efileoeu euewg LDL_(S\ii 
[BLDa(g)^ Gl^ffliLiGlLDsfil^, Gld^ uaaKjaerr aiug aiug efileoeueoLU 


CT 6 ij 6 ii 6 TT 6 ii 60)6iiaa6un:Lb CT6OTU60)^ predictor <gn_guii- 


h : X ^ y 


Validation data: ^([5 predictor- 6 OT a 6 ®fTlLJi_i ffijliuna serrerr^n: ct 6 ot 
observation CTeOTeiiii, ageiiii ggeiiaerr validation data 

^60)^aaLJu(5iii. @>([5 predictor-gg Sgijeii GlffLLJ 6 iig^(g) (geoJD^guLS^Lb 30 

observation-^6iigj Sgeroeu. CBmi 500 sample data- 60 ) 6 iia eoau!]!^ 

60 ) 6 iiggj 6 rrS 6 TTn:Lb CT6OTJDn:^, ^ 60 ) 6 ii ^ 60 ) 6 OTg 60 )gLL(Lb GlanQggj learner-ga 

Gleiiguii 300 ggeiiaeoerr LDL_(5iii Gt<an:(5ig§J S6ii60OT(S\Lb. 

Ljl6OT6OTij Gl<sn:(S\gg algorithm Qy)6ULb L6’g(j^6rr6TT 200 

6 iil 60 ) 6 U 60 )LU Glff[r^6u S6ii60OT(5iii>. @6ij6iin:gu algorithm ffiJliLirra 

a6®fTla.£ljDg[r ©^60)6ULun: CT6OTU60)g Sffn:^LJUg^(g) ageiiii) 200 ggeiiaSen 

validation data CT6OTLJu(5lii>. Oungjeiina sample data-eiileOT 25% validation data- 

^60)LDLL(Lb. 


Loss / Risk: a6®frlLJi_i CTuSungjii 100% ffiJliLina ^eroiDLungj. ^eiieungu 
geuguii predictor Qy)6ULb CT(S\aaLJu(5lii <a6®frlui_l aeocreoLDLuneOT ld^ui_ii_6ot CT^g 
^6TT6ii 6i]l.£lgg^^ S6iiguu(B'®|D§J CT6OTU60)g4 <gn_gu6iiSg risk 

^(gii. ^gneugj CBmi validation data-g<a Glan:(S\g§J Sffn:^a(gLb Sungj, cbld 
eroaiijl^ aerren a60OT60)LDLun:6OT LD^ui_i4(gLb, ^g6OT a6®frlLJi_i4(gLb serren SeiigyunSL 
'^^ui_l' CT6OTLJu(5lii>. ag[rg60OTggj4(g 850 uaaraaerr GlaneoOTL 

S[BaL_(S\ULlgg'*g^6OT 6i]l60)6U 1200 0Un:LLJ CT6OT [BLDa(g 6J^a6OTS6II GlglJl^grTg^LD, 

algorithm Qy)6ULb a6®frlaaLJu(S\ii> <a6®frlLJun:6OTgj 1190 ^uniu ^^eugj 1210 
0un:LLJ CT6OTgugfT6OT ©(gaigLD. 6jGl6OT«frl^ ©gj6ii60)g ^gj a^gjaGlaneoOTL 
l_lgga[aa6frl6OT 6ijl60)6U60)LU 60)6IlggJ SgagniLILDna 6i]l60)6U60)LUa 
a60^a(gLbSu[rgj, ^gj ©eheungjgneOT ©(gagiii. ©gjSeu ffiJliLirreOT (^60)JDLL(Lb an_i_. 
^^g alSl-aerreii Seiigjun© ©(g^grr^gneOT, CBiogj algorithm ffijliuna Seueroeu 
GlffLu-^ljngj CT6OTgj ^irggii. 6i]l60)i_60)LU iBlag gj^djliLiLDnaa Glanljlggn:^, ^gj 
ggeiiaeroena a^gjaGlaneoOT© <s6®fTlLJi_i (glai^ggnLD^, LDeOTUunLib Glffiugj 
GlaneoOT© @»ui!jl<a.£ljDgj ct6otSjd ^irggii (©g 60 ) 6 OTU u^jSI over-fitting-^ 
aneocTeunib). ©gjSuneOTJD Risk-g ^errefjlQeii^^ 2 uiyaeh serreneOT. (j^gdil^ true 
risk ^(BggJ empirical risk. 



True Risk : 1200 ^urriu ld^lji_i daueocTL (S[BrrL_(S\uLl^^'a^^6OT eijleoeuLuueOTgj 
1190 0un:LLj ct6ot a6®M4aLJu(5iiiSun:§j, ©«5)i_Lijlg^6rr6rT 10 ^urriu CT6OTu§j^n:6OT 
a 60 OT 60 )LDLun: 6 OT risk. © 06 O) 6 OT generalization error CTeOT^ii 
Glun:§jLJU60)i_Lun:6OT ^6OT60 )jd a([56iin:4(g)6ii^n:^ 6 j^u(S\'^16otjd ct6otlj Glurri^err. 
^6OTn:^ SurreOTJD i!jl60)^a60)6rT ^eiiGleurri^ ^g6iil^4(g)Lb ^sfrl^^sfrlLurraa 
aeocra-^lLiSia <9^gu6ii^^(g) u^eurra ^eweiia^frleOT ffgriffiJlLurreOT empirical risk ct6otjd 
^ 6(jrgu a60OTi_j5lLuuu(S\'^liD§J. R(h) ct6otu§j Risk of hypothesis ^^neiigj 

aeo^ULjleOT Qy) 6 ULb CT^a-aijuLL Gl6ii6frlLi5(5lii h(x), mapping Qy) 6 ULb ^ 60 )ldlu 
S eueoOTiyLu aeocTewLDLUueOT Gl 6 ii 6 frlLij(S\ii f(x) ffiDLorra ©^eurr^ ^([5 

Risk CT 6 (jru 60 )^SLU s^gU'^IDgJ- 


Empirical risk: Sffn:^ 60 ) 6 OTaan:a CBmi 200 i_i^^a[aft 60 ) 6 TT ^efituu^na eoeu^gja 
GiftneDOTLn:^, ^60)6ii ^eiiOeuneOTjSleOT aeoOTeoLDLurreOT efUeoeu-sigiLb - 
a 60 ?lTlaauu(S\.® 6 OTJD 6 ijl 60 ) 6 U.S(g)LDn: 6 OT SeiigyufiLeoLa aeoOTLjSl^gj, ^ 6 ii^ 60 )jd 
CT(S\LJU^6(jr Qy)6ULb ^ 60 ) 6 ot^§j^ ^g6iia(^.S(g)LDn:6OT S^ngniLiLDneOT risk ^ 6 OT 60 )jd 
^ eoLD-saeunii. ©gjSeu empirical risk Qy)6ULb a6orfrlaa5LJu(5l'^6OTJD 

i_l^^a[aa6ffl6OT 6ijl60)6ULun:6OT§j S^rrgmjjLDna ©eiieueneii ^umij SeiigyuriLiy^ 
^60)LD^^([5ft(g)Lb CT6OT [BLDI-Dn:^ 6II60)gLUgU^§J'S <9n-JD (J^lylL(Lb. 


Empirical Risk Minimization : [bld§j SOT^ 60 ) 6 OT.san:a u^Seugu algorithms 
GianeDOT© 2(_[56iin:.ss5UUL_i_ u^Geugu predictor-aefil^ ^Ijd^^ 60)^4 aeocTLjSliLi 
validation data- 60 ) 6 ii.s Gian:(5l^§J ^eiiOeun:!^ aeorfrluunguLb ^gn:Luuu(5\.®JD§j. 
u^Geugu observations Qy)6ULb ^ehGleuneOTjSlg^Lb agherr ©^uGleijr s^gnffiJlLurrgOTgj 
ff>6D(jT(S\l!jlly4aLJU(5l'^iD§J. @0^6O)ff>LU ©^UlljlgOT LD^LJI_lff>6frl^ 6T^gU60)l_LU 

LD^LJI _1 ihlaa (g) 60 )JD 6 iin:a aghenG^n: ^^gosyra ff>gDOT(5li^iyuuG^ Empirical Risk 
Minimization gTgOTLJU(S\Lb. iljlgOTgUi^Longu- arg min 


CT6OTU§j argument of minimum ct6otlj Oun:^ 56 rru(S\Lb. ©eiieuagu aeoOTLjSlLuuuLL 
LD^LJLjl 60 ) 6 OT, ^eiiGleuui^ algorithm-ii ^6OT4Gla6OT(SjD Glaa 60 OT(S\ 6 rr 6 rT 
parameters GlaaeoOT© ^6TT6iil(5i'^ltD§J. LjleOTsyrSg a 6 ®frlLJi!jl 60 ) 6 OT 

CT6ij6ii6rT6ii gtiiJii tBiiueuaii), a6orfrlaaLJu(S\ii ld^ui_i ^6 TT6iia(g)^ 

gj^djliLiLDua ©([5a(g)Lb &T6OTU§j SuueOTJD 6iil6)^LU[aaGl6rT^6uaLb aeooTa-^lL©-* 
@ 60 )^lj u^jSI PAG Model-^ aaeocreuaLD. 


Probably Approximately Correct (PAC Method) 


^([5 aewfrlLJuneOT Qy) 6 ULb (£lai^^^LJu(Bii <s 6 ®frlui_i CTeLeuerreii gntJii <?fflLun: 6 OT^n:a 
@^ 5 a(g)ii), ^^60)6OT CT 6 ij 6 ii 6 TT 6 ii giTtJii CBiiueumi &t 6 otu§j SuneOTJD eiile^^Luaaerr CT^eumi 
(J^60)JDL1 j 1^ ff>600T.S.^I_LJU(S\.^JD§J. ^(,[5 a6®frlLJUn:«frl6OT a6®fTlLJI_l 

probably approximately correct -^<s ^ 60 )LD 6 ii^^(g) ^eu^jSl^ &t6ot06OT6OT6ot 
U60OTI_iaO6TT^6UaLb S6II6D(jT(S\Lb CT6OTU60)^ ^(^^leU 6II60)gLU60)JDa6rr OaaeOOT© 

^^aeugj over-fitting ©^ 6 um_D^ inductive bias 

Ou^au 6 ijl 6 TT[a(g).^JD^a,i.i.d (j^60)jdlijI^ ulijI^^I^ ^geiiaerr 
^ 6 fii.sauuL_( 5 i 6 n 6 n:^a,^^ 6 OT sample complexity CTeiieugneii a 6 ®frlLJi_i 

^g6n:6iia(g) s^ffliLiaa ^60)LDiL(Lb ct6otu§j SuaeOTJD ScBaa.^^ CT^eumi ^gmLiuu(S\.^JD§j. 
uleOTearij accuracy LD^auii confidence parameters Qy)6ULb cbld^ ff>6Drfflui_i CTeiieugneii 
gHtJii gj^dJlLULDueOTgj CT 6 OTU 60 )^.s a60CT4.^(5i'^tD§J • @Lb(j^ 60 )JDLijl^ realizability 
assumption CTguu) ^giiLDaegrib aaeocTuuiSlLb. [Bmi 

ff>a 60 CTLjSua(g)Lb Agnostic PAC Model-^ [§[a.^6iil(5iii. 

(gjSluLjlL^errerr ^eiiGleuaeOTjSleA efUen-sageogiLiLb aaeDoreumi. 


Overfitting: ^(g^leu ggeiiaeoena Glaa(5iggJ learner-gu 

^L_(S\OLDaggLDaff> ^ 60 ) 6 OTggjg gg 6 iia 60 ) 6 mL(Lb Glaa(5iggju u^4.^6OTa^ overfitting 
CT6OTJD ^umLiLD 6j^ui_ 6iimLiui_i 26rr6TTgj. ©eiieuaau ^erreii-sgi 
ggeiiaeoenu Glu^auaOaaerri^Lb learner-^6OTgj a^auaOaaerrerr 
OffLULuaiD^, aeuuLoaa LD6OTLJuai_Lb Gls^LugjefjKSl-^JDgj. SffageoeOTUjleOTSuagjLb, rBaii 
CT^ijUaij4.^6OTJD LD^LJl!jl60)6OTg gJ^djllLILDa® ^6ffl.S.^6OTJDgJ. 26a6a risk-6OT 

LD^LJI_1 CTuSuagiii (g)60)JDS6ii. ^gearaSeuSiLi ©60)g ff^lLuaear aeorffluuaa 
CT(S\ggj.s Oaaerrga (j^i^Luagj. ejOearsfrl^ uli!j1^#Ilij16ot Suagj ^6fri.sauui_ag i_i^lu 
gg6iift(^ft(g) ©g6OTa^ (j^60)JDLuaff> aeorffluOleoeOT (glai^gg (j^i^Luagj. ^aSeu 
Overfitting-g ©^euam^ Gls^Lueug^aaa aeaenSg inductive bias ^(gii. 


Inductive bias: hypothesis class CTeAugj ma^fflg ggeiiaefii^ aerr^a 
^ 6 ijGl 6 Iia 6 OT 60 )JDL_lLb CT^Og^g label-6Ill_6OT (J^60)JDLJU(S\g^'* S6II60OT(5ili CT6OTJD 
Glgai_itLjl 60 ) 6 OT 6i]l6aa(g.^ljDgj. ©gjSeu Inductive bias ^(gii. biased CT6OTJDa^ 



^6(jrj5l60)6OTff CT6OTgu Glurri^err. ©Lb(j^60)JDLi!jl^ learner-^6OT§j, 

hypothesis class-^ <9^JDLJu(5i'^6(jrJD Gl^ai_iji_ia6frl6OT ^lyuueoLLLjl^, ^geiiaewerru 
u^jSIlu ^jS160)6ii 6ii6rTij^§jaGlaa6rr.^ljD§j. ^eiieuagu Glu^gu-sGiaaeoOTL 
^jSl 6 iil 6 OTiyLJU 60 )i_Lijl^ a 6 orfriLJL]l 60 ) 6 OT (glaL^^§j6iiS^ inductive bias 
CT6OTJD60)^aaLJU(S\'^liD§J. @§jS6II fffflUJa6OT (J^60)JDLLlLb 5^l_! 


Hypothesis Class: ^([5 learner-g inductive bias-^<s ©(^-s^Loagu 
2 ^ 6 ii 6 ii§j hypothesis class ©^ 60 ) 6 ot finite & infinite &T6OTau ©geocr© 

6 ii 60 )aLuaaLJ iljlfflaaeumi. Hypothesis &t 6 otu 60 )^ ai^gjSaagfr oimih 

Giffa^eueumi. &t6ot06 OT6OT6 ot a60^LJi_ia6frl6OT agaerff^'Sea grguLb 

6ii60)gLU60)JD60)Lu4 GlaaQ^gj, aeorfrlaa.? Offa^g^eiigj finite hypothesis 

class. 2 ^ag 60 CT^§j.S(g) youtube-^ login OffLLJU 4 Lb ^(^euij aaeoeuiijl^ u-s^u 
uaLg^ii, LD60)6ULijl^ @60)6raLiga^a uaLg^ii 0^ai_ij.?^lLuaff> GaL©® 
GiaagDOTiyi^-s-^JDaij grsfil^, ^eui^-saaeOT hypothesis class ua^u uai_^ 

LD^auii ©60)6aLuga^a uai_^ giguii ©ggoOT© eugoaiijlgOT ^60)LDL4Lb. ©^60)6 ot 
finite hypothesis class-<s(g) a^aggoormaaff GlOT^guguaii. ^saa^ LD^Gl/nai^giiGga 

gT^^ gUgOaiijlgOT ^gUi^gOLLU gffgOgOT ©(^-SlgiLD gTgOT gIlgO)gLUaU<SaGgII (J^lyLUa^ 
^gagii.S(g), ff>a^^, ua^, [Bgoaffagogu, ffgoOTgoL, [ELgOTii, (gj^^go^u uai_^aga graa 
u^Gguau gugoaiijldilig^gj Loa^jSl LDa^jSlu uarra-^JDaij. gisaGgii, ^gii(gaaagOT 
hypothesis class-^ ©giigugrrgii gugoaaga gagia graa gugogLuau-saSgii 

(j^iyLuag uiy [fgoOT© Gi<sagoOTGi_ ©gogSiLi infinite hypothesis class-agi 

agaggDOTLoaaff GlOT^guguaii. 


Sample complexity : ma^fflg gggiiagfilgOT gigoiagorfrlagoa LBlagiiii (ggoJD^gj 
©(g^gaSgua ^^gugj ^gagii-sg) ^^amaa ©(g^gaSgua agD^LJi_i fffflujaa 
[Bgo)i_GiuJDagj. gTsaSgu agrofrlLiuaslrlgOT Ogu^jSlLuagOTgj ma^fflujaa 

Gifta(5l'S<SLJu(5l'^6(aJD gggiiagfilgOT gigocrgorfrlagoagoLULi OuaaugGg ^goLD-^JDgj. 
GgagaiLiLDaa grgiigiigagii gggiiaga Oaa(5l^^a^, ^ggugoLLU ag®frlLJi_i 

^ggagiiag) ffffliLiaa gigg ra a^jQjguGg sa mple complexity ^(gii. 


S similarity D to the power of m grgOTU^^ m -^gOTgj Loa^iJlLuaa gT(S\'*'*LJu(S\ii 
gggiiagfflgOT gigoOTgorfrlagoa gTgO(agDrfrl4go)ago)Uj4 agocTa.^li_ aggiiii 

.^Igogag Gg^JDLD QgOTguigLDaau. _ 




^6fflaauu(S\ii LDfi^iJlg gg6iia6rTn:6OT§j i.i.d CTguii) ^gULDn:6OTg^6OT eii^Siu 
[Bi_«*.^ljDgj. i.i.d CT6OTJDn:^ independently identically distributed CT6OTgu Glurrigen. 
^6(jrS^)n:Gli_n:6OTFU ffurru^JD g«frlgg«frlLun:6OT gijeii LDn:^iJla60)6TT CT(S\gggUuQl 
learner-a(g)4 a^LjluueogSuj ©gj eudjliLiguggJ'^IDgJ. 


Realilzability assumption: rBaii ej^aeOTSeu aeoOTL i_iggff>[aa6frl6OT 

uaaraaerr ^^afflgga^ ^ggu 60 )i_LU 6 iil 60 ) 6 ULL(Lb CTguii) 

^gULDa6OTg^60)6OT CBLDgj algorithm euenijggja Glaagh-^JDgj. ©gjSeu realizability 
assumption CT6OTUu(5lii. @^g ^gULoaeOTii) CT^eua eueoaiLiaeOT ff>6Drfrlui_ia(gLb 

Giuaig^gagj. 2gag60CTggj4(g rBaeDoaugeog aeoOTi^ eiilLLa^, geoeu 6iil(i^LDa y, 
6iil(I^LDa CT6OTUg^(g CT^g ^g)|LDa6OT(^Lb Os^LULU (J^l^LUagJ. ©gjSua6OTJD 
(gl60)6ULU^JD g6OT60)LD60)Lu.a (gjSl4(gLb a6®frlui_ia60)6aLJ u^jSI Agnostic PAC model-^ 
aaeocreuaii). 


Accuracy parameter: predictor/classifier-eA ld^lji_i CTeiieu^aeii girgii 

gj^djliLimaa ©(gaigii CTeAuewga (gjjlaa € ct^uld (gjSliJ© ULU6OTU(S\.^ljDgj. 
CT6OTS6iiR(h)>€ &T6OTUgj a6®frlLJua«fil6OT Sga^eijliLiaaeiiLD, R(h)<=€ CTeAugj 
SgagaiLimaa CB^eu aeorfrlijuaeOTaaeiiLb CT^gg]-* Gi«sa6a6aLJU(S\^JDgj 


Confidence parameter: delta iD^ULjleOT ^[^uueoLLi!] !^ (gjSlaauu(S\'^ln)§J. 


CBaii CT^ijuarruugjLb, aeorfrlLJuaeOT CT(5lggJ'S<9^^6iigjLb ffiJliLiaa 
©(guug^aaeOT (glai^gaeii 1 CTsaeiiLb, geuaiaa ^eoLDeug^aaem: (glai^gaeii 0 
CT6OT6iiLb Glaa6rr6TTLJu(S\'^tDgJ. @g6OT ^i^lju 60 )i_llj 1^ uaijgga^ 1 CTeAugj ©g60OT(5lii 






ffLDLDrra ^6O)LD6ii0^an:6OT oi&sik, Glan:60OTi_n:^, 1-6 ct6otu§j s60OT60)LDLun:6OT 

a6®fTlLJi!jl60)6OT CT(S\^§J'a'9>i-JDU Sun:§jLDn:6OT^n:a ^60)LD6ii^^an:6OT 

^gjSeu LDa^fflaeoerr CTeiieuerreii gntJii CBiiueurrib 

CT 6 OTU 6 O) 0 a (g)j 51 a(g)Lb confidence parameter (1-6 ) 


©§j 6 ii 60 )g CBmi a^gu-sOanecOTL efSle^^LuraaeoerT eoeu^gj simple linear 
regression-^ ctuui^ CT6OTgu urrijaaeunLb. 



Linear Regression 


Simple Linear Regression 


Simple Linear ct6otu§j 6ii^.s agrrerr ^^5 ^i^LiueoLLurreOT 

algorithm ©geoOT© eiilgiigaagrr greiieiiagu Gi^m_iji_i 

u(S\^^lju(51'^6otjd6ot, algorithm greiieiiagu ^ 6 ot§j i_iijl^60)6u SLD^Glaagrr-^ljDgj, 

^ 6 n: 6 ii 4 (g) s^fflLuaa aerrengj grgOTUgj SuaeOTJD eijlg^^LuraaeOTgrrOLU^gumi 
^([5^l6u ^[j6i]&&j)6n 60)6ii^§j GlffLU^(j^60)JDLijl^ Os^LLigj uaijaau Sua-^SjDmi. 
2^ag6D0T^§j4(g) ^(_[5 lljlLffaeijlgOT ^6TT6iil60)6OT.S OangDOT© 6ijl60)6U60)LU 

greiieuagu (glijgocriijlLJUgj 6 T 6 ot ©LJU(g)^Lijl^ aagDcreumi. ©§j 6 ii 60 )g [BLbL 6 li_(j^ 6 rr 6 n: 
^60)6OT^§j OlLa^aeijlgOT ^erreiiLb, 6ijl60)6ua(^Lb X ld^fuld Y variable-^ 

gT(S\^§j.sGiff>a 6 fr 6 TT S6ii60OT(S\Lb. ©gjSgu label set LD^guii domain set 


x=[6,8,10,14,18,21] 
y=[7,9,13,17.5,18,24] 


u^Seugu iljlLffneiilgOT 6iilL_i_^6O)0u (in inch) X grgOTUgj explanatory 

variable gTsyreiiLb, ^6ii^jSlgu60)i_LU 6ijl60)6ua60)6TTa(in dollar) GlarreoOTi^i^a^Lb Y 
gT6OTU§j response variable gT6OT6iiLb ^60)^aauu(S\ii. Mehefil gfleiioraagrTna 
©6ii^60)JD ^([5 6ii60)gui_LDn:<a eugog^gj urrijuSumi. ^uSurrgj^rrgOT ^gogii 
Glff^g^iiSun:4(g) CBLoa^^ Gl^iJliLiLb. matplotlib grgOTUgj giigo)gui_[a<ago)grT gugwg^gj 
ariLL a^giiii ^([5 library ©^g^grrgrr pyplot Qy)gULb cbld^j i_igrrgfrl 

giilgiirrKjai^aarrgOT gugoguLib giigo)gLuuuL_(S\6hgrT§j.©^^an:gOT [glg^ lilgOTgiii^LDn:^. 


https://gist.github.eom/nithyadurai87/cb77831526033da63be0790f917efe63 


import matplotlib.pyplot as pit 




X=[[6] , [8], [10], [14], [18], [21] ] 


y=[[7], [9], [13], [17.5], [18], [24]] 

pit.figure {) 

pit.title{'Pizza price statistics') 

pit.xlabel{'Diameter (inches)') 

pit.ylabel{'Price (dollars)') 

pit.plot(x,y,'.') 

pit.axis([0,25,0,25]) 

pit.grid (True) 

pit.show() 


Gl6II6fflLJU(5\^§j'^6OTJD 6II60)[JUI_Lb l!jl6OT6II(^LDn:g[] ©(^a(g)Lb. 



L 


6ii60)gui_^^^ flLffrreijleOT 6i]lL_i_^§ja(g)Lb, 6i]l60)6ua(g)L6l60)i_SLU 

(S[B*rjLDn:jD^ Gl^n:i_iji_i arreocreumi. ^^rreiigj ^6 otj516ot ld^ui_i 

LD^GljDrreOTguii &t6otuS^ ScBfTLDrrJD^. ©[a(g)Lb ^uui^^^rreOT 

aerrerrgj. @60)^ eoeu^gj ^([5 algorithm- 4 (g)a GlaaQuu^^aaem: 

(glg^ Ljl6OT6II([5LDaFU. 


https://gist.github.eom/nithyadurai87/d94507f9052a6120dce5f20e31806cea 


import matplotlib.pyplot as pit 

from sklearn.linear_model import LinearRegression 


X 


[[6], [8], [10], [14], [18]] 



model = LinearRegression{) 


model.fit(x,y) 

pit.figure{) 

pit.title {'Pizza price statistics') 
pit.xlabel{'Diameter (inches)') 
pit.ylabel{'Price (dollars)') 
pit.plot(x,y,'.') 

pit.plot(x,model.predict(x) , '--' ) 
pit.axis([0,25,0,25]) 
pit.grid(True) 
pit.show() 

print ("Predicted price = ", model.predict([[21]])) 


: sklearn CT 6 OTU§j eueroaiLiaeOT 

algorithms-g<s GlaaeoOTL ^(,[5 package linear_model library- 

^ aerrerr LinearRegression() class-^ 6 OT§j import GlffLLJLULJu(5i<^tD§J. ©gjSeu 



aeorffluurreOT/predictor [Bld§j ^ijeiiaewOTU 

Glan:(Buu^^an:a fit() CTguii) method ULU 6 OTu(S\^^uuL_(S\ 6 rr 6 TT§j. flljiKg) [bld§j Model 
CTeiieurrgu ©an:60OT(S\6rr6TT§j ct6otu60)^ ^jSIlu pyplot Qy) 6 ULb euewguLLb 

6ii60)g^§j a[rL_i_uuL_( 5 l 6 rr 6 rT§j. aeoL^liLirra predict() CTguii function, cbld^j model- 6 OT 
iJ’gj GlffLU^UL© 21 inch ^ 6 tt 6 ii GlarreoOTL flLffrreijleOT efileoeu CTeiieuerreii 
CT6OT aeorffla-^ljDgj. 


: SiD^aeoOTL 60 )u^^n: 6 (jr (glgeoeu 

@iija(g)LbSun:§j, flleOTeui^Longu ^^5 eueoguLib 06ii6fiiuu(S\.^lJD§j. i!jl6OT6OTij 21 inch 
^gneii GlaneDOTL tilLOTeiileOT efileoeuLurra [[22.46767241]] ^rguii ld^lji!j160)6ot 
Gt6II6fiiLJU(5i^§J'®|D§J. 


6ii60)gui_^^^ ^60)6OT^§jLJ i^eh^ffl 6fjl6iirr[aa(^a(g)ii ^([5 SjBfrSan:© 

a6rr6TT^60)^4 arreocreumi. ©gjSeu hyperplane CT6OTgu ^ 60 )^aaLJU(S\Lb.. ^^rreugj 
Sa[r(S\^n:6OT algorithm-6(jr ©sfiieiii^Lb fllLffrreiileOT 6iilL_i_^§j4(g) 

San:L_i^60)6OT 60)6ii^§j^^n:6OT eiHeoeueoLua a6orfri4(g)Lb. i_iffl^gli<san:6OT 

San:L_i^^(g)ii) 260OT60)LDLun:6OT i_i6rr6fri efileugaaerr ^60)LD^§j6rr6rT ©i_^^^(g)L6l60)i_SLU 
^([5 @ 60 )i_Gl 6 ii 6 fii ©([5 uu 60)^4 arreocreumi. ©« 5 )i_Gl 6 ii 6 friSLU residuals 
^^ 6 u§j training error CT 6 OTgu ^60)^aaLJu(5iii. tBrni aeocTL 

21 inch eijlLLLD GlarreoOTL fllLffrreiileOT efileoeu 24 Lrreuij ct6ot CBLoa^ ej^aeOTSeu 

Gl^fflLLiLD. ^6OTn:^ © 60 )^Slu jBLDgj Model Glarreocr© <a6orfria(g)LbSun:§j efileoeu 

22 Lrreuij oi&sik, an:L_(5i6ii60)^4 arreocreumi. © 60 )^Slu generalization error / risk 

CT6OTguii '9^gU6iirj. ^^rreiigj GlurrgjuueoLLurra ^(,[5 i_ii5i^60)6u 
^ewLD^gjaGlarreoOT©, ^ 60 )^ 60 ) 6 ii^§j aeorffliju^n:^ 6j^u(S\ii error CT6(jr^ Glurri^eh. 
Residual sum of squares ct6otu§j risk-ga aeocra-^lL ^([5 function 

@ 60 )^Slu loss function / cost function CT6OTgu <9^gu6iirj. Residuals sum of 
squares CT 6 (jru§j ©^ljlj 16 ot grifffflsoLU ffeociLjSl^gj ff 2 _^Lb. risk- 

ff(ff) CT6OT6OT ffrigeOCILD, ©60)^ CTUUl^ff ff60OTff^(5i6II§J, ff60OT(5iflilyUU§J CT6OTU§J 
ijl6OT6iig5LDngU ffneDoieunib. 


Algorithm - Simple linear: 


[BLDgj aeo^ffluuneOT fit() Qy)6ULb 0<sn:6rr(^Lb a^LoeOTun:© i!jl6OT6ii([5LDn:gu 

®§j(d6u Simple linear regression-aarreOT algorithm 


y = a + |3x 


[BLDgj explanatory, response variables a (intercept term), p 

(coefficient) CTguib ©geocTiJi parameters an: 60 OTLJu(S\^ 6 (jrJD 6 OT. ^^rreiigj 
© 6 ii^ 60 )JDLLiLb CBLDgj algorithm Glan:6rr.^lj]]§j. ©gjSeu cbld^j model¬ 
er risk-a(g)4 angeocTib. efilLLn^ risk-g CTUuiya (gieoJDUugj 

CT6OTU§J Gl^i5i^§J6ijl(5iLb. P-6OT LD^ULJ160)6 ot 4 a60OT(5il!jllyaa (S6II60OT(S\li. 

ijl6OT6OTij © 60 )^ 60 ) 6 ii^§j a-6OT ld^ul]160)6ot 4 a60OT(5iLjliy^§j efilLeumi. Variance 
CT6OTU§j [BLb(j^ 60 )i_LU explanatory variable-^ aerren ^geiiaermeOTgj ei^enmei] 
©60)l_Gl6II6fii eiil^^LUITff^^^ ^60)LD^§J6rr6TT§J CT6OTU60)^a (g)jSl4(g)Lb.. 

[1,3,5,7,9,11.] CT6OTgu ©([ 5 a(g)Lb variance 0 6jGl6OT«fri^ 

©60)611 #gn:60r ©60)l_Gl6II6fflLLll_60r ^60)LD^§J6rr6TT§J.. ^§jS6II [1,5,7,10,11.] 6T60rgU 

6 T 6 DOTa(^ 4 an: 6 or © 60 )i_Gl 6 ii 6 fri 

^ 6 or 60 )LD 6T6ij6ii6TT6ii ©(^a^JDgj 6T6or4 a 6 OOTa^(S\ 6 iiS 0 Variance Co- 

variance 6T6oru§j [BLDgj explanatory & response variables ©g 60 or(S\Lb 
6T6ij6II6rT6Il ©60)l_Gl6II6f[i 6ijl^^LUn:ff^^^ ^60)LD^§J6rr6TT§J 6T60rU60)^a (g)jSl4(g)Lb.. 
© 6 ij 6 iilg 60 or(S\a(g)Lb ©60)i_(Slu linear Gl^n:i_iji_i ©^ 60 ) 6 uGlLU 6 orJDn:^, ©^6or ld^ui_i 0 
©60)6iia(^4an:6or ^^^g[5ja6rr Ljl6or6ii^5LDn:gu- 


Numpy library-^ 26fr6TT ^([5^l6u ff> 6 ®ffl^ functions SLD^a60OTi_ ^^^g[aa 6 ffl 6 or uiy 

[BLDgJ ^g6Iia60)6n:LJ 6iil60)l_60)LU ^6fri.S(g)Lb S6II60)6U60)LUff GlS^LLJ.^60rJD60r. 




https://gist.github.coin/nithyadurai87/406747e718d04a4bc339f740b5f9de62 


from sklearn.linear_model import LinearRegression 
import numpy as np 

X = [[6], [8], [10], [14], [18]] 

y = [[7] , [9] , [13] , [17.5] , [18] ] 
model = LinearRegression]) 
model.fit(x,y) 

print ("Residual sum of squares = ",np.mean{(model.predict(x) 

2 ) ) 

print ("Variance = ",np.var([6, 8, 10, 14, 18], ddof=l)) 

print ("Co-variance = ",np.cov([6, 8, 10, 14, 18], [7, 9, 13, 

18]) [0] [1] ) 

print ("X_Mean = ",np.mean(x)) 
print ("Y_Mean = ",np.mean(y)) 


y) 


17.5 



Gl6II6fflLl5l_n:a l!jl 6 OT 6 II([ 5 Lb LD^LJI_ia6rr Gl6II6ffluU(S\li. 


Residual sum of squares = 1.7495689655172406 
Variance = 23.2 

Co-variance = 22.650000000000002 
X_Mean = 11.2 
Y Mean = 12.9 


^eijeiingu CBmi LD^uiqaeoerr ffLD6OTun:L_i^^ 21 inch 

eiilLLU) GlaneoOTL fllLffueiileOT efileoeu CTeiieungu 22.46 Lneuij CTsyra an:L_(5l'^iD§J 
CT 6 OTU 60 )^ ^jSlLueunLb. 


y = a+ Px 


= a+ p (21) 


= 1.92 + (0.98*21) = 1.92 + 20.58 = 22.5 


where as, 


P = 22.65 7 23.2 = 0.98 


a= 12.9 - (0.98*11.2) = 12.9 - 10.976 = 1.92 



R-squared Score: aewL-^lLurra CBmi ai^eurra-^liLierreTT model CTeiieuerreii gniJii 

a60OT60)LDLUn:6OT LD^LJl!jl60)6OT ^6frl4(g)Lb ^6TT6Iia(g)U GlU[r^5^^LL16rr6rT§J CT6OTU60)^a 

a60OTa.^l(5l6iiS^ R-Squared score 


https://gist.github.coin/nithyadurai87/a39ecee72dc4a266933621c298e80df9 


from sklearn.linear_model import LinearRegression 
import numpy as np 

from numpy.linalg import inv,Istsq 
from numpy import dot, transpose 

X = [[6], [8], [10], [14], [18]] 

y = [[7] , [9] , [13] , [17.5] , [18] ] 
model = LinearRegression]) 
model.fit(x,y) 

x_test = [[8], [9], [11], [16], [12]] 

y_test = [[11], [8.5], [15], [18], [11]] 


print ("Score = ", model.score(x test, y test)) 



R-squared score = 0.6620052929422553 


score() CTguii function, , CBiogj validation data- 60 ) 6 iiu 

Glun:([ 5 ^^ 6 fjl 60 )i_Li!jl 60 ) 6 OT CBLoaig) ^efrla-^JDgj.. Glungjeuna score GleueffluuQ^gjii 
LD^uiq 0-djl([5^§j l-6ii60)g ^60)LDLL(Lb. 1 CT6OTU§j overfit-^06un:^, l-a(g) 

GlcBt^ta-^liLi LD^uuna CBmi ej^gu-sGianerreTreuaLb. @[5J(g) 

CBLbqpeoLLU model-6OT iD^uiq 0.66 ct6ot Gl6ii6frlLJUL_(5l6rr6TT§j. Simple linear -g eiilL 
multiple linear-^ accuracy ©eOTguii © 60 )^uu^j 51 

aaeoOTGuaib. 



Multiple Linear Regression 


Simple linear-^ ^([5 iULffaeiileOT eijleoeuLuaeOTgj eiJlLLeo^u Gluagu^gJ 

^^affluu60)^.s aecOTSLaii. 260OT60 )ldlij 1^ eiHeoeu ^^ai5iui-i<S(g) l5§j 

§[I 6 iiLJU(S\Lb toppings-ii ^^5 aageorfrliLiaa &t 6 otS 6 ii ^^5 i!jlL_ffa 6 iil 6 OT eiileoeu 

eijlLLii LD^guii ^^g^errerr toppings-6OT CT 60 OT 6 Drfrl.a 60 )a ^.^lu ©geDoreoLiLiLb 
^ 60 )LD.^JD§j. ©§jSua6OTau ^6OTau<S(g)Lb Sld^ull explanatory variables- 
gu Gtuaau^gj, response variable ^eoLo^^a^, ^gjSeu multiple linear 

regression CT 6 OTLJu( 5 lii. ©^^aasa s^meOTua© Qleijreiii^LDaau 


SiD^aeDOTL a^ageocT^^^ explanatory variable-6iii_6OT toppings-6OT 
CT60(5T6Drfrl460)au4Lb multiple linear-g 2 ([ 56 iiaa.^U 46 aS 6 aaLb. 

ijl6OT6ii([5LDaau. 


https://gist.github.com/nithyadurai87/7068c32bd4d7fccb67ccca39623f68bc 


from sklearn.linear_model import LinearRegression 
from numpy.linalg import Istsq 
import numpy as np 


X = 

[[6, 2], 

[8, 1], 

[10, 0], 

[14, 2], 

0 

CO 
\—1 

y = 

[[7], [9] 

, [13], 

[17.5], 

[18] ] 




model = LinearRegression{) 


model.fit(x,y) 


xl = 

[[8, 2] 

, [9, 0] 

1, [11, 2], 

[16, 2], 

[12, 0 

yl = 

[ [11] , 

[8.5], 

[15], [18], 

[11] ] 



predictions = model.predict{[[8, 2], [9, 0], [12, 0]]) 

print ("values of Predictions: ", predictions) 
print ("values of (31, (32: ",lstsq(x, y, rcond=None) [0]) 
print ("Score = ", model.score(xl, yl)) 


□□□□□□□□□□□□□□□□□□: 


(gl[jg^4an:6OT Gleueffluj© i!jl6OT6ii(^LDn:gu ^ 60 )LDLL[Lb. accuracy 

aaeocTeuaii). simple linear-^ 0.66 multiple linear- 

0.77 CT6OT ©(^LJU 60 )^ aeusfrlaaeiiii. CTuSua^jii simple linear-g eSii. multiple 
linear-gu ULU 6 OTu( 5 l 0 §JiiSua§j accuracy ©eOTguii 


values of Predictions: [[10.0625 ] 
[10.28125] 

[13.3125 ]] 

values of El, E2: [[1.08548851] 



[0.65517241]] 

Score = 0.7701677731318468 


[Bmi LD^ui_ia6rr ffLD6OTun:i^^ flleOTeui^LDrrgu 

Glun:([ 5 ^§j.^ 6 OTJr) 6 OT. intercept term -^6ot a-6OT ld^ui_i xl x2 CTguib 

^geoOT© variables-giLiii) Gluagu^gJ ^ 60 )LD 6 ii^a^, ©gj Gluagjeuaa Constant¬ 

sa ©(gacgii. 


10.06 = a + (1.09 * 8) + (0.66 * 2) 
= 0 + 8.72 + 1.32 
= a + 10.04 


10.28 = a + (1.09 * 9) + (0.66 * 0) 
= a + 9.81 + 0 
= a + 9.81 


13.31 = a + (1.09 * 12) + (0.66 * 0) 
= Of + 13.08 + 0 
= a + 13.08 



Simple Linear Algorithm 


Simple linear regression -aarrem: ffioeOTun:© lileOTeiii^LDn:^ ^ewLDiLiLb. © 60 )^ 
60)6ii^§j (1,1), (2,2), (3,3) CTguLb i_i6rr6frl 6ijl6iig[5ja(^4(g) LjleOTeui^LD aeorfrluurreOT 
h(x) Qy)6ULb aeorfiiuueo)^ CBarb @[a(g) a^ageocTLorra GlaaerrSeumi. 


a6®MLJua6OT§j ^L_i_a-0 LD^auii) ^L_i_a-1 CTguii) ©g60OT(S\ (j^a^LU 
parameters-gu Gluaau^S^ ^ 60 )ld.^Ijd§j. © 6 ii^ 60 )jdSlu (j^eOTsaij alpha, beta ct6ot 
G leuehSeuau LD^LJi_i6rr6TT parameters-<s(g) GleueiiSeu^ eiiewaiijl^ 
a6®frlLJi_ia6rr (£laL^^0LJu(5i6ii6O)^ fleOTeui^Lb aaeocreumi. 


https://gist.github.eom/nithyadurai87/c57acll97368249f015ed4dldba029f0 


import matplotlib.pyplot as pit 


X = [1, 2, 3] 
y = [1, 2, 3] 



pit.figure {) 


pit.title {'Data - X and Y') 
pit.plot(x,y,'*') 
pit.xticks{[0,1,2,3 ] ) 
pit.yticks{[0,1,2,3] ) 
pit.show{) 

def linear_regression(thetaO,thetal): 
predicted_y = [] 
for i in x: 

predicted_y.append{(thetaOt{thetal*i) ) ) 
pit.figure{) 

pit.title{'Predictions ' ) 
pit.plot(x,predicted_y,'.') 
pit.xticks{[0,1,2,3] ) 
pit.yticks{[0, 1,2,3] ) 


pit.show{) 



thetaO 


1.5 


thetal = 0 

linear_regression{theta0,thetal) 

thetaOa = 0 

thetala = 1.5 

linear_regression(thetaOa,thetala) 

thetaOb = 1 

thetalb = 0.5 

linear_regression(thetaOb,thetalb) 


(1,1), (2,2), (3,3) -aan:6OT 6ii60)[jui_Lb 6 ii60)[jlulju(5\.^jd§j. i!jl6OT6OTg ^L_i_n:-0 
=1.5, ^L_i_n:-1 =0 CTguLb Sungj ffLoeOTuriLi^^ Glun:([50^ (1, 1.5), (2, 1.5) 

, (3, 1.5) CTgULD LD^LJI_ia60)6TTLL(Lb, 


h(l) = 1.5 + 0(1) = 1.5 


h(2) = 1.5 + 0(2) = 1.5 



h(3) = 1.5 + 0(3) = 1.5 


^eiieurrSjD ^llct-O =0, ^llct-I =1.5 CTguii) Surrgj (1, 1.5), (2, 3), (3, 4.5) CTguii) 
LD^Lji_ia 60 ) 6 rTLL|Lb, aeoL^liLirra ^llct-O =1, ^L_i_[r-1 =0.5 CTguib Surrgj (1, 1.5), (2, 
2), (3, 2.5) CTgULD LD^LJi_ia60)6rTLL(Lb ^6fflLJU60)^a arreocreumi. 


h(l) = 0 + 1.5(1) = 1.5 h(l) = 1 + 0.5(1) = 1.5 


h(2) = 0 + 1.5(2) = 3 h(2) = 1 + 0.5(2) = 2 

h(3) = 0 + 1.5(3) = 4.5 h(3) = 1 + 0.5(3) = 2.5 

©eLeiirrgu a60OT(S\Ljli^aaLJUL_i_ LD^ui_ia(^4(g) 6ii60)[jui_[5ja6rr 6ii60)[jLULJU(S\^6(jrJD6OT. 
©60)611 Ljl60r6II([5LDn:gU ^60)LDLLlLb. 


(SLD^a6D0ri_ 3 a6DrfrlLJI_ia6ffl^ 6T06Or a6®fflui_l a600T60)LDLUn:60r LD^UI_ia(^a(g) 

^^gu60)i_LU ^llct LD^LJi_ia60)6rTSLU CBmi ©gu^Lurra a6®MLJLjl^(g) 
6T©^§jaGlan:6rr6rT6un:Lb. ©[a(g) (1,1), (2,2), (3,3) 6TguLb LD^LJi_ia(^a(g) (1, 1.5), 
(2, 2), (3, 2.5) 6TgULb LD^LJi_ia6rr 6ii^§j6rr6rT§j. 6T6orS6ii ^L_i_n:-0 =1, 

^LLfi-l =0.5 6TgULb LD^LJi_ia60)6rT4 Glan:60ori_ a6®frlLJun:60)6or(SLU CBmi S^ij6ii Glffiugj 
Glan:6rTS6iin:Lb. 


@[a(g) Oeuguii 3 ^geiiaerr LDL_(Bii ©(^uu^rr^, ct60)^ 60)6ii^§j.s aeorfrl^^n:^ 

a60?MLJLjl^(g)ii) 26DOT60)LDLun:6OT LD^LJi!jl^(g)LDn:6OT Seiigyun:© (g)60)JD6iin:a ©(^aigiii 

CT6OT ^LDLDn:^ a6UULDn:a.s SklJD (j^i^iLiii. ^ujlgaaeDora.^^ 

^geiiaerr ©(^-s^LbSurrgj, S6iiguun:L_i^60)6OTa aeoOTLjgl^gj SklJD a^eiiii 

^^^gSiD cost function ^(giii. 


Cost Function: ©^^arreOT s^LoeOTun:© i!jl6OT6ii(_[5LDn:gu. 


J = cost function 


m = Glmn:^^ ^geiiaeffleOT CT 60 OT 6 ®frl 460 )a 


i = GlLorr^^^ ^geiiaefii^ ^eiiGleurreOTJnrra.? Glff^eu a^eiiii. . 


h(x) = a6®frlaaLJU(S\.^6OTJD LD^UI_l 


y = CT^ijun:ija.^6OTJD aeoOTeoLDLuneOT ld^lji_i 


SiD^aecOTL i_i6rr6fri 6iil6iirr[5ja60)6TT Ljleijreiii^Lb ffmeOTuriLiy^ Glurri^^^, CBmi 
(S^ij^Gl^(S\^§J§J 6 rr 6 rT aeorfrluurreOT .^IjSIlu ^ 6 tt 6 ii cost SeuguurrLiyeweOT 
Gl6ii6frlLJu(5\^§J'^itD^n: oi&sik, arreocTeiiLD. (glg^ flleOTeui^Longu- 


https://gist.github.eom/nithyadurai87/86bd4ec2288d0e9afl38a30a7af44a09 

GleuefiiiJ©: 

cost when theta0=1.5 thetal=0 : 0.4583333333333333 
cost when theta0=0 thetal=1.5 : 0.5833333333333333 
cost when theta0=l thetal=0.5 : 0.08333333333333333 

(gla(i^Lb 

(1, 1.5), (2, 1.5), (3, 1.5) vs (1, 1), (2, 2), (3, 3) (^Li_a-0 =1.5, ^Li_a-1 =0) 

J = 1/2*3 [(1.5-1)**2 + (1.5-2)**2 + (1.5-3)**2] = 1/6 [0.25 + 0.25 + 2.25] = 
2.75 

(1, 1.5), (2, 3), (3, 4.5) vs (1, 1), (2, 2), (3, 3) (^Li_a-0 =0, ^Li_a-1 =1.5) 

J = 1/2*3 [(1.5-1)**2 + (3-2)**2 + (4.5-3)**2] = 1/6 [0.25 + 1 + 2.25] = 3.50 



(1, 1.5), (2, 2), (3, 2.5) vs (1, 1), (2, 2), (3, 3) (^LLrr-0 = 1 , ^LLrr-1 =0.5) 


J = 1/2*3 [(1.5-1)**2 + (2-2)**2 + (2.5-3)**2] = 1/6 [0.25 + 0 + 0.25] = 0.50 


S6iiguun:(5\'S60)6TTa an_L_i^ 

Qy)6ULb ^eiiGleurri^ aeorffluurreOT CBL^gjii a6®fflui_iLb ^6 tt6ii Seiigyurri^^ ^ 60 )LDLL(Lb 
CT6OTU60)^a SklJD (J^l^LL(Lb.©^^60)aLU S6IlgUUn:(5\'*6ffl6OT LDI_[5J(g)a6rr 
aeocT^flli^-aaiJUL© ^60)611 2-^^ 6ii(g)aaLJu(5\6ii^^«an:6OT arrgeoOTLb &T6OT6OTGl6ii«frl^, 
ffgrrffiJleOTLua ej^rreugj ^([5 CT^ijLD 60 )JD ct 60 ot 

an_L_i_uu(5\6ii^^(g) u^6un:a eiilQii. CTeOTSeu^rreOT 

@§jSun:6OTgu ^ 60 )LDaaLJUL_( 5 \ 6 rr 6 rr§j. ©gjSeu sum of squares error CT6OTgu 


@[a(g) CBUli) 6J^a6OTS6II a6OOT(S\flll9.^0 ^LLU-O = 1 , ^LLU-l =0.5 LD^uiqaeTT 
GlaaeoOTL aeorffluuaSeOT @6®/!)^^ ^6 tt6ii cost-g Gl 6 ii 6 fflLJu(B 0 §J 6 ii 6 ®^a 
aa60CT6uaLb(0.50). ct6otS6ii ^geqaeffleOT CT60OT6®ffla6®a Glui^-^leOTUg^LD, 

Qy)6ULb S6iiguuaL_i^6®6OT CBULD aeuuLDuaa aeocra-^LueuaLb. 


0.50 CTguLD Seuayua© CTsaa ©^60)6 ot ^aii 

@6OTgijLb (g)60)JDaa u^Seuau ^L_i_aaa(^a(g) ©ffSOT^6®6OT60)LU ^(^liu 

GlffLugj (g)6®jr)6iia6OT Seuayua© ^L-i_aaa6®6rr 

a^eqeiiS^ Gradient descent u^Seuay ^LLnaaefrlear 

LD^LJ 60 )ULL(Lb, ^6ii^ayaaa6OT cost-giLiii 6ii6®gui_LDaa 6ii6®g^§j uaijuSumi. 
^gjSeu contour plots Ai>(g)U3- 


Contour plots: 



^L_i_n:-0 , ^L_i_n:-1 ld^^ld ©6ij6iilg60(jTi9.6(jr ^i^lju 60 )i_lij 1^ aeoOTLjSliLiuuLL cost 
LD^Lji_l @LbQy)6OT60)JDLL|Lb (^uulJlLDfreocT 6ii60)gui_LDn:a 6ii60)g^§j ariLL a^eiieuS^ 
contour eueoguLLD <^160 ot60ot eui^eiilSeun: 6 iil_i_ eui^eiilSeun: 

ijl6OT6ii([5LDn:gu i_i6rr6ffl 60)6ii^§j6rr6rT CT^eumi cost 

CT6OT 60)6 ii^§j 4 GlaneoOTLn:^, ^6 ii^ 60 )jd CT^eumi ©60)60 otlju^ 6 ot Qy)6ULb -^lecOTeocTii 
SurreOTJD ^([5 eui^euLD 6j^u(S\ii. 


euLLii SuneOTJD 6ii60)gui_^^^ u^Seugu ^llct LD^ui_ia(^aan: 6 OT cost u^Seugu 

eULLCaaerma 06II6fflLJU(S\.^6OTJD6OT. CT6otS6II 6IIL_I_^^6OT 60)LDLU^60)^a aeOOTLjSleil^eOT 
Qy)6uSLDn: .^160 ot60ct^^6ot ^i^uuna^eo)^ ^60 )i_6ii^6ot Qy)6uSLDn: (g)60)JD^^ 

^grreii S6iiguun:L_i^60)6OT OeuefflijuiSl^^a an_i^LU ^LLn-saeogrr [Bmi aeDCTLjSluj 
(j^i^LLiii. S 6 ii 60 ) 6 U 60 )luSlu Gradient descent GffLU-^JDgj. 


.^L^aaeoOTL (glgdJl^ -2 2 euewg ^L_i_n:aa(^4(g) 100 (j^ewJD LD^ui_ia6rr 

Lorr^jSl LDfT^jSl ^efitaaijuL© cost a60OTi_j5lLUUu(5t'^tD§J. numpy Qy)6ULb 

^L_i_n:aa(^a(g) LD^Lji_ia6rr 6ii^[aaLJU(S\.^l6OTJD6OT. © 60 ) 6 ii uniform distribution 
(J^eoaiLljl^ ^eOLDLLlLD. 


https://gist.github.eom/nithyadurai87/8cl20370181f5bb9ad966dc9fdd7935b 


from mpl_toolkits.mplotSd.axes3d import Axes3D 



import matplotlib.pyplot as pit 


import numpy as np 

fig, axl = pit.subplots{figsize={8, 5), 

subplot_kw={'projection': '3d'}) 


values = 2 

r = np.linspace{-values,values,100) 

thetaO,thetal= np.meshgrid(r,r) 

original_y = [1, 2, 3] 
m = len(original_y) 

predicted_y = [theta0+(thetal*!), thetaOt(thetal*2), thetaOt 
(thetal*3)] 


sum=0 



for i,j in zip{predicted_y,original_y): 

sum = sum+{(i-j)**2) 

J = 1/ {2*m)*sum 

axl.plot_wireframe{theta0, theta1, J) 
axl.set_title{"plot") 

pit.show{) 


[BLDgJ ^[J6Ilff>(^aan:6OT 6II60)[JUI_LD • 


Gradient descent 


(g)60)JD^^ ^6rT6ii SeugULin:© 6j^u(S\^^a s^i^lu ^L_i_n:4a6ffl6OT ld^lji!j160)6ot4 

S6ii60)6U60)LU gradient descent GlffLU^JDgj ^L_i_n: 4 a(^a(g) 

^([5 (gijSluulLL LD^Ljijl60)6OTa Glan:(S\^§J ^^;i)«sn:6OT cost-g<a aeoOTLjSl^JDgj. 
ijl6OT6OTij ^LbLD^LJi!jld}l([5^§j, ^([5 (g)jSlLJi!jlL_i_ ^6rT6ii ^LLfraaefileOT 

LD^Lji_ia 6 rr (gieoJDaauuL© cost aeocTLjSlLUUuQ'^tDgJ. ©eneiirrJDrra 

^eiiGleurri^ a^^-^liijlg^Lb .^jSlgj (g)60)JD^§ja GlarreoOTSL eu^gj (g)60)JD^^ 

^ 6 rT 6 ii cost a60OT(S\fl^!-'*'*uu(S\^JD§j. . ffLoeOTurr© fleOTeui^LDrigu- 


©[a(g) ^eiiGleurri^ a^^^lualeijr (j^i^eijlg^LD ^llct-O , ^L_i_n:-1 ^.^Ilu6ii^jS16ot 
LD^L ji_ia6rr ^Sg (gieojnaaijuL S6ii60OT(5tii. ©gjSeu simultaneous update 

CTSJrUU^'^ltDgJ. «^60OT60OTLDn:a ©(^uflleOT, .^160OT60OT^^6OT ^l9.UU@^60)LUa 
aeocTLjSleugjLb, euLLLorra ©(^ufleijr ^6ij6iiL_i_^^6OT ewLDLU^ew^a a60OTi_j51iL(Lb 
S 6 ii 60 ) 6 U 60 )lulli(Sld gradient descent GlffLU^JDgj. ©gjSeu global optimum-g 
^ewLLLiLD 611 ^ Al>(g)U^- LDrTJDfra local optimum ct6otu§j cost-6OT LD^LJi_ia6frl^ 

Gl^rrLijff^lLurra 6 j^jd ©jnaaraaerr ©(^uilileOT, ^eiiGleurri^ ©JDaa(j^Lb local 
optimum CT6OTLJu(5lii. Giurrgjeurra linear regression-aarrem: euewguL^^^, local 
minimum ct6otu§j -^leoLLungj. global optimum ldl_(S\Sld. 


Alpha CT 6 OTU§j ^LLcraaefrleOT LD^LJi_ia 6 rr eifb^ ^^rreii (gieoJDaauuL 

(S6ii60(jT(5lii CT6OTU60)^a (g)jSl4(g)Lb. LD^LJi_i 161^61110 ^IjSliLi^rraeiiLb ©^euriLD^, 

LhlaeiiLD Gluffluj^rraeiiLb ©^eurrm^ fffflLurreOT ^eneiil^ ^ 60 )ldlu S 6 ii 60 OT( 5 lii. 
Gluugjeurra 0.1, 0.01, 0.001 CT6OTgu ^ewLDLLiLD. 


SiD^aeoOTL Qy)6OTgu ULraa^fflg^LD J-6OT LD^ui_i i_i6rr6frl 60)6ii^§j6rr6rT 

CT6OT eweii^gjaGlarrerrSeiimi). ©uSurrgj alpha-6OT ld^lji_i 
LBlaff-^jSliLi^aa i_i6rr6frl 60)6ii^§j6rr6TT 

Si^^^la(g)Lb -^liSlgj ^IjSl^aaa (g)60)JD4auuL_(5\ global optimum-g ^ 60 )i_ 6 ii^^(g) 
L6l(g)^^ ScBijii i!jli9.<S(g)Lb. ^6 tt6ii S06O)6iiuu(5lii. ^(5 

.#l6OT6OT.? ^l6OT6OT ^l^LUaa ^I^G1 lU(S\^§J 60)6IIUU§J Sua6u CBai^LD. ^gjSeii 

LBlaeiiii J-6 ot ld^lji_i global optimum-4(g) iBla 

^(^aaewLDLLjl^ eu^^ag^ii s^l, ^i^-eoLU iBla cf^amaa ct(S\^§J eweiiuu^a^, 

global optimum-gU 4 Lb ^aeoOTi^, Seuay CTcaSaa Glffearau 6iil(5lii. ^eiieuaSa) 

CTcaGlarBSaa Glffearau ewLDLU^^eosaff G1<?6 otjd60)i_lu^ 
^eui^ii). ct6otS6ii alpha-6OT ld^lji!j 160 ) 6 ot ffiJlLuaem: ^^aeiil^ Glaa©-*-* SeueoBCBii. 


Alpha-<a(g) ^'eiiGleuai^ ^LLaeiileOT partial derivative 

a60(jT(Bflli9-'*'*uu(B'^ltD§J. Simple linear regression-^ ^L_i_a-0 ct6otu§j ^sfrliLiaa 
©([5a(g)m. ^L_i_a-1 CT6OTU§j x-6iii_6OT ©([54(g)Lb( h(x) = ^L_i_a-0 + ^L_i_a-lx 

). partial derivative -aaaem: ffLD6OTua(Bii (J^ 60 )jdlijI^ iBeBeui^LDaau ^ 60 )LDLL(Lb. 


-^L^aaeoOTL ^L_i_a 0 ld^lji_i (gleoeuiLiaa 60 ) 6 ii.saLJUL_(B, ^LLal - 6 <jr ld^lji_i 

LDL_(Bii gradient descent(j^60)JDLijl^ (g) 60 )JDaauu(B'^tD§J. 50 a^^^laea 
[Bi_^^lju(B'^ 6 otjd 6 ot. i!jl 6 OT 6 OTij J_history CTgyLb list-^ ^eiiOeiiai^ ai^^^liijl^Lb 
aeOCTLjSlLUUULL cost SffL 6 laaLJUL_(B (g)60)JD6Iia6OT LD^LJI_l S^ijeil 

GtffLLJLuuu(B'^|D§J. partial derivative-^ 6 OT§j delta CTgyLb u^^^a^ 

(g)j 51 <*<*LJU(B'^|D§J. LD^UI _1 l!jl 6 OT 6 II(_[ 5 Lb 6 II 60 )aLljl^ a 6 DOTa 5 .®l_UU(B'^tD§J. 


delta = 1/m . (h(x) - y).x 


1/m . (^L_i_al.x - y).x where h(x)= ^L_i_al.x 


= 1/m . X. ^L_i_n:l.x - x.y 

https://gist.github.coin/nithyadurai87/43664cacd625e7c290c8812894dca659 

X = [1, 2, 3] 
y = [1, 2, 3] 


m = len(y) 

thetaO = 1 

thetal = 1.5 

alpha = 0.01 

def cost_function(thetaO,thetal): 

predicted_y = [thetaO+(thetal*!), thetaOt{thetal*2), thetaOt 
{thetal*3)] 

sum=0 

for i,j in zip{predicted_y,y): 


sum = sum+{(i-j)**2) 



J = 1/{2*m)*sum 


return (J) 

def gradientDescent(x, y, thetal, alpha): 
J_history = [] 
for i in range (50) : 
for i, j in zip{x,y) : 
delta=l/m*{i*i*thetal-i*j) ; 
thetal=thetal-alpha*delta; 

J_history.append{cost_function{theta0,thetal)) 
print (min{J_history) ) 

gradientDescent (x, y, thetal, alpha) 


0.5995100694321308 



Gradient descent-^ CTeiieuerreii SeueoOT^ii CTsyra 

Gftn:(5i^§J, (g) 60 )JD 6 iin: 6 OT cost-g S^ijeii UffLULueunib. SiDg^ii 

G^n:i_ijff#lLun:6OT Lon^fflLuneOT cost LD^LJi_ia6rr 

G6ii6fiiLJu(B'^tDG^«frl^ CBrni global optimum-g ^60)i_^§j 6ijlL_Si_mi CT6OTgu 
^rr^^LD. s^ag 60 OT^gja(g) 300 400 eueoguBeuaeOT J ld^lji_i, iBla 

iblaa (g) 60 )JD^^ ^6ttS6ii SeiigyuCB-^IDG^^^frl^ (<0.001) global optimum-g 
^ 60 )i_^§j eiilLLgj ct6otSjd ^ij^^ii). ©gjSeu Automatic convergence test CTeAguii 
^ 60 )^aaLJU(B'^lD§J. 



Matrix 


CTeocraerr ^ 6 orfrl 6 ii(g)^§jff Glff^eugj ^eroMaerr CT 6 OTLJu(S\ii. simple linear 
regression-^ ^([5 ct 60 OT 60 ) 60 ot eoeu^gja Glarreocr© SeiiGljDn:!^ CTeocreoeocra 
aeo^^S^mi. ^syrn:^ ©sfrleui^Lb multiple linear-^ ^eOTgu-scgiLb Sld^ull CTeoOTaerr 
^ 6 (jra)n:a.? SeiiGljDn:!^ ct60OT60)60ot 4 aeorfrlaaij Surr^JDgj. ^^rreiigj ^([5 

6 ]UL_i^ 6 OT ffgjg ^ 19 . 6ijl6iig^60)^ LDL_( 5 lii 60)6ii^§j4 GlaneoOT©, ^eiiffiS’Li^.eOT 
6iil60)6U60)Lu4 aeorfiluugj simple linear CTsfrl^, ^([5 6 ]ul_i^ 6 ot ^eoJiiaefrleOT 

CT60(jT6orfrl460)a, CT^^60)6OT eui^LLD U60)^LU§J Sun:6OTJD u^Seugu an:g6®frla60)6TT 
60)6ii^§j4Glaa60OT(S\ ^eiiaS’Li^eijr eiileoeueoLua aeorfrluugj multiple linear 

CT 6 OTS 6 II ^60)^LJ U^jSla (J^6OT6OTij ^6OT^a(g)Lb SlD^ULL CT60OTa60)6TT 

CTeiieuagu ^6®frl6ii(g)LJU§j, ^6®frl eu^aauuLL CT60OTa60)6rT eoeu^gj CTeiieurrgu 
aeocra.^iB'Serr Glffiueiigj SurreOTJD ^^5 <^ 6 U ^i9.uu60)i_ 6ijl6)^LU[5ja60)6TTa 
GlanerrOT S6ii60OT(5lii. 


□ □□ : 


^^5 ^6®frlLLjl^ CT^^ 60 ) 6 OT rows LD^guii columns aerrengj ct 6 otuS 0 ^6®filLijl6(jr 
dimension CT6OTUu(S\ii. 2 rows ld^fuld 3 columns GlarreoOTL A ^eorfrl flleOTeui^LDUgu 
^ewLDLLiLb. 2*3 dimensional matrix CT6OTLJu(S\ii. ^6®frlLi!jl^ aerr^rr 

LD^LJI_ia60)6TT A -6OT CT^^60)6OTLUn:6II§J rOW LD^^li CT^^60)6OTLUn:6II§J 

column-^ ^LbLD^LJi_l aerrengj ct6ot 4 GlarrlJi'S'S (S6ii60OT(S\ii. A22 

CT6OTU§j ©geoOTLrreugj row @[j 60 OTi_n: 6 ii§j column-^ aerren 5 CTguii) 

U)^lULjl60)6OTiS (^j51iSi(^U). 




Multiple linear CT6(jr^ eui^LbSurrgj ^([5 ^eorfrlLLjleOT dimension ct6otu§j 
CT60(jT6orfrl460)a LD^guii <s6®filLJU^^(g) ct(S\^§J'* OaneTTi^Lb ^liffraaerr 
^.^Ilu 6 ii^ 60 )jdu Gluagu^gJ ^eoiDiLiLb. ^^rreiigj. 


rows = no. of records 


columns = no. of features 


^^5 column-ga GlaneoOTL ^6®ffl Gleu-SLij CT6OTgu ^ 60 )^aauu( 5 lii. 
ijl6OT6iig5LDn:gu- agrren LD^uigagoerr ^go)ia gT^^60)6OTLun:6ii§j row 6T6OTgu 

LDL_( 5 lii Oa>n:(S\^^n:^ SungjLDneOTgj. B 3 6T6otu§j Qy)6OTJDn:6ii§j row-^ agrren iD^uungOT 
38 grgOTUgo^a (g)j 51 .S(g)ii). ^^5 GlguaLgog 0 -indexed 1 -indexed giguii ©^5 

gugoaagffl^ (gilSl-sagunii. B 3 grgOTUgj 1 -indexed grsffl^ 38 -giL(Lb, 0 -indexed grsffl^ 
47-giL(Lb @jSla(g)Lb. 



□□□□□□□□□□□□□□□: 


@[J6dot(B ^6Drffla6ffl6OT dimension s^LDLorra ldl^Sld ^eiieiilgeDOT© 

^ 6 ®frla 60 ) 6 mL(Lb an_L_i^ dimension OarreoOTL LD^Ojnrri^ ^eorfrleoLU 

(J^l^LLIli. 


If (3*2) * (3*2) = 3*2 


1+7 8+2 


3+9 4+10 


5+11 6+12 


□□□□□□□□□□□□□□□□□: 





(j^^eurreugj ^ 6 ®fflLLjl 6 OT column @[j60OTi_n:6ii§j ^eorffluHeOT row 

^.^lLU60)6iia6fil6(jr CT60OT6®ffla60)a ffLDLDua ldl_( 51 Sld ^eiieijlgeoOT© 

^ 6 ®ffla 60 ) 6 rTLLiLb LD^GljDui^ ^ 6 orffl 60 )LU ^(^eiirraa (j^i^LLiii. i_i^^n:a 

S([ 56 iin: 4 auuL_i_ ^eorfrlLijleOT dimension-^ 6 OT§j, (^^eurreugj ^eorfrlLijleOT rows ld^fuld 
© geoOTLueiigj ^eorfrlLLjleOT columns ld^ui!j160)6otlj Glu^jSli^a^rb. 


If (3*2) * (2*2) = 3*2 


(j^^euueiigj ^eorffliijl^ agrrerr row-6OT LD^ui_ia6fr ©ggocTLueiigj ^eorfflujl^ agrrerr 
column-gOT LD^ui_ia(^i_gOT ^sffl^^sfflLuuau Oui^-sauuiSlLb. ilHgOTgOTij 
^uOui^aagfflgOT LD^ui_iagrr ^gOTJDuaa an_L_i_LJU(S\.^gOTJDgOT. ©giiguuSjD ^gorfflagfrlgOT 
Ou([5.aa^ ^go)i_GlugU'^|D§J. 


(l*7)+(2*9) (1*8)+(2*10) = 7+18 8+20 


(3*7)+(4*9) (3*8)+(4*10) = 21+36 24+40 


(5*7)+(6*9) (5*8)+(6*10) = 35+54 40+60 



transpose: 


^([5 26rr6TT rows columns-^<a LDn:^JDUu(S\6iiS^ 

^6®ffliijl6OT transpose CT6OTLJU(S\Lb. 


Inverse: 


^^5 ^6®fflLLjl6OT inverse ct 6 otu§j ai^eOTLorrem: (^60)jdlij1^ a 60 OT 4 ^i_uu( 5 iii. 2*2 
dimension GlarreoOTL ^«xfrlLLjl6OT inverse LileOTeiii^LDrrgu aeocra^LLiu^ii. All, A22 
LD^Lji_ia6fii6(jr Glu([54ag^4(g)Lb A12, A21 LD^ui_ia6ffl6OT aerren 

eiil^^LurrffLDrreOTgj 1-6 ot 6ii(g)4aLJu(5iii. Gl^n:i_ijff^lLun:a 

^eorfrlLul^ aerren All, A22 LD^LJi_ia6rr ©LLDrr^JDLb GlffLUUJUULQii, A12, A21 
LD^Lji_ia6rr CT^ij ld 60 )jdllj 1^ LDrr^JDUUL^ii Giu([5aauu(S\ii. 






Identity Matrix: 


CT6DOT6D^.S60)aLijl6un:6OT TOWS LD^guii columns-g<s OarreoOTL ^eorfflSiu 
CT6OTUU(Bli. ^(i5 ^6®frlLljl6OT Qy)60)6U6ijlL_l_^^^ LDL_(Bli 1 CT6OT 
LD^JD CT^eurrii y,^^LULb &t 6 ot ^gjSeu Identity Matrix 

CT 6 OTUu( 5 iii. ^(5 ^ 6 ®frlLL(Lb, ^ 6 DrfrlLijl 6 OT inverse-LD Ss^ij^gj Identity matrix-g 
2([56iin:a(g)Lb. 






Multiple Linear Algorithm 


Sld^ull ^eOTJnrra.? ^([5 6 iil 6 )^LU^ 60)^4 

aeorffla-^ljDgj CTsfrl^ ^gjSeu multiple linear regression CT6OTLJu(S\ii. ^eiiGleuui^ 
xl,x2,x3.. ct6ot 4 GlarreoOTLn:^, ffioeOTun:© flleOTeui^LDn:^ 

^eWLDLLlLb. 


multiple linear-^ ^euGleun:!^ feature-<S(g)Lb ^([5 ^LLn ld^ui_i an:6D0Tuu(5lSLD ^eiilg, 
no.of rows -gu Oungu^gj Lonjnngj. ct6otS6ii ^llct ct6otu§j CTuSungjii 1 row-^ 
u^Seugu LD^LJi_ia 6 rr ^eoLD^gjerrgn ^eorffliuna i!jl 6 OT 6 OTij ^eKfrleoLU 

transpose Os^Lugj 1 column-^ u^Seugu LD^LJi_ia6rr ^eoLD^gjerrerr OeuaLgrra 
LDrr^JDeunLb. ct6otS6ii ^rreOT transpose GlffLULUuuLL ^llct ^ 6 Drffl 60 )LuiL(Lb, features- 
aarreOT X ^ 6 Drffl 60 )LuiL(Lb multiple linear-aaneOT ffioeOTun:© 

6 II^§J 6 ijl(S\.^JD§J. 


ffLD6OTUn:L_l^^ ^LLflO -6I1L6OT xO CTgULD l_l^LU feature ^6OTgU 
Sffijaauu(S\'^ltD§J. CTuSuagjLb 1 CTguib ld^lj 60 )uSlu Glu^jSli^a^ib. i_i^lu 
feature-^^ ^lluO ld^uljI^ ^([5 LDn:^JD(^Lb ej^uLugj. Gleuguii ^eorffla^ffleijr 
Glu([5ft<sg^a(g) §j60)60OTi_ifflLL|Lb eiiewaiijl^ Sffij4auuL_(S\6rr6TT§j. 


-^L^aaeoOTL 


800 ffgjg 2 ^60)JDa6rr, 15 eui^L ueoi^LU effLi^eOT efileoeu = 3000000 


1200 ffgjij 3 ^eoJDaerr, 1 ueo^LU aS’Li^.eijr eiileoeu = 2000000 


2400 ffgjg 5 ^eoJDaerr, 5 eui^L ueo^LU eiileoeu = 3500000 


CTguii) 3 ^geiiaerr X CTguLb ^6 d^lij 1^ Glff>n:(S\4aLJUL_(B6rr6TT6OT. ^eiieurrSjD 100, 1000, 
10000, 100000 ^.^Lu LD^LJi_ia6rr ^llctO , ^L_i_n:l, ^L_i_n:2, ^L_i_n:3 -6 ot 
LD^Lji_ia6n:n:a ^L_i_n: ct^uld ^eorffliijl^ Oan:(B'SauuL_(B6rr6n:6OT. ©60)6ii ©geocr^ii 
SiD^aeDOTL ffLD 6 OTun:L_i^ 6 OT ui^ Oun:^ 5 ^^uuL_(B, h(x) ^eorffleoLU 2([56iin:a(g).^6(jrJD6OT. 


https ://gist. github, com/nithyaduraiS 7/5abf 51 e4b26717a3427d 15f cacaOf 48f 


import matplotlib.pyplot as pit 
import numpy as np 

X = np.array {[ [ 1 , 800 , 2 , 15 ], [ 1 , 1200 , 3 , 1 ],[ 1 , 2400 , 5 , 5 ]]) 
y = np.array{ [ 3000000 , 2000000 , 3500000 ]) 
theta = np.array{[100, 1000, 10000, 100000]) 

predicted_y = x.dot(theta.transpose{)) 


print {predicted_y) 



m = y.size 


diff = predicted_y - y 
squares = np.square(diff) 
#sum_of_squares = 5424168464 
sum_of_squares = np.sum(squares) 
cost_fn = 1/{2*m)*sum_of_squares 
print (diff) 
print (squares) 
print (sum_of_squares) 
print (cost_fn) 


□ □□□□□□□ : 


[2320100 1330100 2950100] 

[-679900 -669900 -549900] 

[462264010000 448766010000 302390010000] 



1213420030000 


202236671666.66666 

□□□□□□□□□□□□□□□□□□□□□ : 


Cost function: 


simple linear-g h(x) a6D0T.s.®(S\ii) 

LDL_(5iii LDaguuQii- 


Gradient descent: 


^gjeiiLD simple linear-g simple linear-^ ^L_i_a0 ld^lji_i 

(g)60)JDaaLJu(S\6ii^^<sa6OT ffLD6(jruai^^ X CT6OTU§j ^LLaO 

-6iii_6OT xO S<?ijaauuL_i9.([5LJU^a^, ^6O)6OT0 §j ^L_i_a LD^LJi_ia6rr 
(g)60)JDaaLJu(S\6ii^^<sa6OT ffLD6(jrua(5iii LJl6OT6ii([5LDaau ma^iJiLuaa^^aeOT 
gl([5a@m. 




minimum cost 


Gradient descent-gu ULU6OTu(5i^§J6ii^^(g) u^eurra i!jl 6 OT 6 ii^ 5 Lb ffLD6OTun:L_i^6OT Qy)6ULb 
ScBiji^-Lurra [Bmi minimum cost-g 6j^u(S\^^a s^i^lu ^LLrreoeii 
(j^i^LLiii). ^6OTn:^ features-6OT CT60OT6®frl460)a gradient 

descent-gu ULU6(jru(5\^§J6iiS^ 6jG6OT«fii^ features-a^ii 

^^gU 60 )i_LU transpose aeocT^Oii^-Uugj L6l(g)^^ Scbij eijlgLUii) UffLULuasr^^i^LU^rra 

^eWLDLLlLD. 


Feature Scaling: 


©[5j(g) ^eiiGeun:!^ feature-ii GeueiiSeugu ^^rreiileurreOT ct 60 ot eui^leoffaefil^ 
^ 60 )LD^^^ 5 uu 60 )^a aeusfriaaeiiLb. ffgjg ^19. &t6ot 

©aneoOTLn:^ ^60)6ii 800 1200 eueogiijlg^Lb, ^eoJiiaefrieOT CTeocreorfriaeOTa ct 6 ot 

CT(5i^§J'*G'*n:60OTi_n:^ ^60)6ii 2 5 eueogiijlg^Lb ug 6 i]lLLi 6 rr 6 rT§j. 


ffgjg ^ 19 ^ = 800, 1200, 2400 


^eoJDagfr = 2, 3, 5 


©eneungu ^eiiGeurii^ column-^ aerren LD^ui_ia(^Lb GeueiiSeugu ct 60 ot euiJleOTS^aefrl^ 
©^euriLD^, ^60)6OT^§jLb -1 +1 eueog 0 1 eueog ct6ot 

normalize ©s^LueiiS^ feature scaling CT6OTUu(5lii. a^eiieuS^ mean 

normalization ilileOTeiii^LDngu. 


particular value - mean of all values 


maximum - minimum 


ffgjij = (800 -1600)7(2400-800), (1200-1600)7(2400-800), (2400- 
1600)7(2400-800) 

= -0.5,0.25, 0.5 


^60)JDS56rr = (2-3.5)7(5-2), (3-3.5)7(5-2), (5-3.5)7(5-2) 


= -0.5 , -0.16, 0.5 


SuneOTJD multiple linear-^ gradient descent-gu ULU6OTu(5l^§jLbSun:§j 
^ 6 ijGl 6 iin :([5 feature-ii ^erreii ©(^uu^n:^ plot-^6OTgj iBla 

iBlaa (gigu-^LU ^erreii euLLraaeroerr 6j^u(5l^§jLb. ct6otS6ii 

6 roLDLU^^ 60 ) 6 OT Glff6OTJD6roi_LU 161 ^ 61111 ) #lgLDUu(5lLb. normalize GlffLLJLULJUL_(5l 

^60)6OT^§J 6IIL_l_[aa(^Lb ^6TT6iil^ 6roLDLU^^60)6OT Glff6OTJD6roi_LU 

euff^Luna ©([5a(g)Lb. 




Pandas 


Pandas CT6(jru§j ^geiiaeoerr ^go}i.^. ^6u.#l ^LD.sSa^JD6iiagu 

6iii^6ii60)LDUu^^(g) python 6ii^[a(g).^6OTJD ^([5 library Qy)6ULb csv, txt, 

json SuneOTJD u^Seugu euiyeuraaefrl^ Qy)6u^ ^geiiaeoerr ct( 51^§J ^(5 

dataframe-^a ldct^jSI CBLDaSa^JDeiingu ^geiiaeoerr ^aeueoiD^gja Glangrren 

(J^lyLLlli. 


@[a(g) CBmi unijaaLi SuragiLb ^([5 effLiyeOT eiJl^ueoeOT eiHeoeueoLU 

(glij6D0TLijlLJU^^(g) a^eiiii u^Seugu angeo^ai^ib, ^^6OTUiy (glijeocriijlaaLJULL 
6iil60)6ua(^Lb CSV Sarruuna OarriB'Sa.LJULiSigrrgrreOT. ©gjSeu training data 
CT6OTUu(5iii. @60)^ 60)6ii^§j^^n:6OT [Bmi ^^5 model-g 2([56iin:4auSun:.^SjDn:Lb. 


model-g a([56iin:4(g)6ii0^(g) (^eOTsyrij training data-g rBarb i-iffl^gj 
GlanerrOT S6ii60OT(S\Lb. ct^^60)6ot ^geiiaerr ct^^60)6ot null LD^ui_ia6rr 

CT 60 ) 6 iiGlLU^ 6 un:Lb eijl^ueoeOT eiHeoeueoLU urr^aaas^iyLU (j^a^Lua 
angerofrlaerr, S^ewemijl^eun:^ © 6 OT 6 ot ljIjd a[rg6®ffla60)6rT CTeiieun:^ [fa(g) 6 ii§j, Null 
LD^Lji_ia60)6TT CTeiieungu [BLDa(g) SeueoOTiyLU LD^LJi_ia6TTn:^ ldu^jS! ^ 60 )lduu§j 
(Sun:6OTJD6ii^60)JDGlLU^6un:Lb Pandas Qy)6ULb CBmi daiLigj urrijaauSurr^SjDmi. ©gjSeu 
preprocessing / feature selection CT6OTLJu(5iii. ©^piarreOT (glg^ LjleOTeui^LDn:^. 


https://gist.github.eom/nithyadurai87/5fd84f40ce26eac65a8060ee2dl5280a 




import pandas as pd 


# data can be downloaded from the url: 

https://www.haggle.com/vikrishnan/boston-house-prices 

df = pd.read_csv{'data.CSV') 
target='SalePrice' 

# Understanding data 
print (df.shape) 
print (df.columns) 
print(df.head(5)) 
print(df.info{)) 
print(df.describe{) ) 

print(df.groupby{'LotShape').size{)) 

# Dropping null value columns which cross the threshold 
a = df.isnull {) .sum{) 


print (a) 



b = a [a> (0.05*len(a))] 
print (b) 

df = df.drop(b.index, axis=l) 
print (df.shape) 

# Replacing null value columns (text) with most used value 
al = df.select_dtypes{include=['object']) .isnull {) .sum{) 
print (al) 

print (al.index) 

for i in al.index: 

bl = df[i].value_counts{).index.tolist {) 
print (bl) 

df[i] = df[i].fillna(bl[0]) 

# Replacing null value columns (int, float) with most used value 
a2 = df.select_dtypes{include=['integer', 'float']) .isnull {) .sum{) 
print {a2) 


b2 = a2[a2!=0].index 



print {b2) 


df = df.fillna(df[b2].mode{).to_dict{orient='records')[0]) 

# Creating new columns from existing columns 
print (df.shape) 

a3 = df['YrSold'] - df['YearBuilt'] 

b3 = df['YrSold'] - df['YearRemodAdd'] 

df['Years Before Sale'] = a3 

df['Years Since Remod'] = b3 
print (df.shape) 

# Dropping unwanted columns 

df = df.drop{["Id", "MoSold", "SaleCondition", "SaleType", 
"YearBuilt", "YearRemodAdd"], axis=l) 

print (df.shape) 

# Dropping columns which has correlation with target less than 
threshold 

X = df.select_dtypes{include=['integer','float']).corr {) 
[target] .abs {) 



print (x) 


df=df.drop(x[x<0.4].index, axis=l) 


print (df.shape) 


# Checking for the necessary features after dropping some columns 

11 = ["PID","MS Subclass","MS Zoning","Street","Alley","Land 
Contour","Lot Config","Neighborhood","Condition 1","Condition 
2","Bldg Type","House Style","Roof Style","Roof Matl","Exterior 
1st","Exterior 2nd","Mas Vnr Type","Foundation","Heating","Central 
Air","Garage Type","Misc Feature","Sale Type","Sale Condition"] 


12 = [] 


for i in 11: 


if i in df.columns: 


12.append(i) 


# Getting rid of nominal columns with too many unique values 

for i in 12: 

len(df[i].unique{))>10 
df=df.drop (i, axis=l) 


print (df.columns) 



df.to_csv{'training_data.csv',index=False) 


□ □□□□□□□□□ □□□□□□□□ □□□□□□□ □□□□□□□□ : 


csv-^ aerrerr ^ijeiiaerr df CTguii dataframe-<S(g)6rr pandas Qy)6ULb ej^JDUULQerrerrgj. 
CT^^ 60 ) 6 OT rows columns serrengj ct 6 otu 60 )^ iljleOTeiii^LDagu ^j^lLueumi. 


print (df.shape) 


(1460, 81) 


Ljl 6 OT 6 ii([ 5 Lb aL_i_ 60 ) 6 rr ct6otG16OT6OT6ot columns aerrerrgj ct 6 otu 60 )^ OeuefflLJuQ^gJii. 


print (df.columns) 




Index{['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 

'Street', 

'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig', 

'LandSlope', 'Neighborhood', 'Conditionl', 'Condition2', 'BldgType 

'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 

' YearRemodAdd', 

'RoofStyle', 'RoofMatl', 'Exteriorlst', 'Exterior2nd', 'MasVnrType 

'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 

'BsmtCond', 'BsmtExposure', 'BsmtFinTypel', 'BsmtFinSFl' , 

'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating 

'HeatingQC', 'CentralAir', 'Electrical', 'IstFlrSF', '2ndFlrSF', 

'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 

'FullBath', 

'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual', 

'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 

'GarageType' , 

'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 

'GarageQual', 

'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF', 

'EnclosedPorch', 'SSsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC', 

'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType', 


SaleCondition 


SalePrice'] 



dtype='obj ect') 


head(5) 5 ^geiiaeroerr GleueffluuQ^gJii. 

print(df.head(5) ) 

Id MSSubClass MSZoning ... SaleType SaleCondition SalePrice 

0 1 60 RL ... WD Normal 208500 

1 2 20 RL ... WD Normal 181500 

2 3 60 RL ... WD Normal 223500 

3 4 70 RL ... WD Abnorml 140000 

4 5 60 RL ... WD Normal 250000 

[5 rows X 81 columns] 

info() ^LDgj dataframe-6OT ^eroiDULi u^jSIlu eiileiiijraaeroerr GleueffluuQ^gjLb. 



print(df.info {)) 


&lt;class 'pandas.core.frame.DataFrame'&gt; 
Rangeindex: 1460 entries, 0 to 1459 
Data columns (total 81 columns): 

Id 1460 non-null int64 

MSSubClass 1460 non-null int64 


SaleCondition 1460 non-null object 

SalePrice 1460 non-null int64 

dtypes: float64 (3), int64 (35), object (43) 
memory usage: 924.0+ KB 

None 


describeO i_i6Tr6fflLi!jlLU^ eiileiiijaaeroeTTa 




print(df.describe {)) 


Id MSSubClass ... YrSold SalePrice 

count 1460.000000 1460.000000 ... 1460.000000 1460.000000 

mean 730.500000 56.897260 ... 2007.815753 180921.195890 

std 421.610009 42.300571 ... 1.328095 79442.502883 

min 1.000000 20.000000 ... 2006.000000 34900.000000 

25% 365.750000 20.000000 ... 2007.000000 29975.000000 

50% 730.500000 50.000000 ... 2008.000000 163000.000000 

75% 1095.250000 70.000000 ... 2009.000000 214000.000000 

max 1460.000000 190.000000 ... 2010.000000 755000.000000 

[8 rows X 38 columns] 


groupbyO ^([5 column-^ aerren LD^ui_ia 60 ) 6 TT Gl6ii6ffluu(S\^^Lb. 


print(df.groupby{'LotShape').size{)) 


LotShape 


IRl 484 



IR2 41 


IR3 10 

Reg 925 


dtype: int64 


^ 6 ijGl 6 iin :([5 column-g^ii) serrerr null LD^ui_ia6ffl6OT CT 60 OT 6 ®fTla 60 )a 60 )LU 

Gl6II6fflLJU(5l^§Jli. 


print (a) 


Id 0 

MSSubClass 0 

MSZoning 0 
LotFrontage 259 

LotArea 0 

Street 0 

Alley 1369 


LotShape 0 



LandContour 0 


Utilities 0 


PoolQC 1453 

Fence 1179 

MiscFeature 1406 

MiscVal 0 

MoSold 0 

YrSold 0 

SaleType 0 

SaleCondition 0 

SalePrice 0 

Length: 81, dtype: int64 


0.05 CT6OTU§j Null-<aan:6OT threshold ^^n:6ii§j 100 <a(g) 5 null LD^LJi_ia6rr 

CT6OT 6 ii 60 )rrLugU'S«suuL_( 5 l 6 Tr 6 rr§j. ct6otS6ii ^ 60 )^ eiilL ^6 tt6ii null 
LD^LJi_ia6rr GlaneoOTL columns aeoOTLjSliuuuL© OeuefiluuQ^^LJuQ'^tDgJ. fll6OT6OTij 
@60)611 dataframe-d)l([5^§j [§aauu(5l'^l6orJD6or. 




print (b) 


LotFrontage 259 
Alley 1369 
MasVnrType 8 

MasVnrArea 8 

BsmtQual 37 

BsmtCond 37 

BsmtExposure 38 
BsmtFinTypel 37 
BsmtFinType2 38 
FireplaceQu 690 
GarageType 81 
GarageYrBlt 81 
GarageFinish 81 
GarageQual 81 
GarageCond 81 



PoolQC 1453 


Fence 1179 

MiscFeature 406 

dtype: int64 

18 columns-giLiLD cfa-^iu OleOTeOTg 81 ct6otu§j 63-^<a<a 
(g) 60 )JD^§j 6 rr 6 rr^ 60 )^a aneocreumi). 


print (df.shape) 


(1460, 63) 


Threshold-^ 6i]li_a (g)6roJD6iin:6OT null LD^LJi_ia 60 ) 6 TTLJ Glu^gu6n6TT text 
column-^6OTgj Gl6ii6ffluu(5l^^uu(5l'^tD§J. include=['object'] ct6otu^ text column- 


print (al) 


MSZoning 0 



street 0 


LotShape 0 


Electrical 1 

KitchenQual 0 

Functional 0 

PavedDrive 0 

SaleType 0 

SaleCondition 0 

dtype: int64 

print (al.index) 

Index{['MSZoning', 'Street', 'LotShape', 'LandContour', 'Utilities', 


LotConfig', 

'LandSlope', 

'Neighborhood' 

, 'Conditionl' , 

Condition2', 




BldgType' , ' 

HouseStyle', 

'RoofStyle', ' 

RoofMatl', 'Exteriorlst 

Exterior2nd' 

, 'ExterQual' 

, 'ExterCond', 

'Foundation', 'Heating 

HeatingQC' , 

'CentralAir', 

'Electrical' , 

'KitchenQual', 

Functional', 







PavedDrive' , 


SaleType', 'SaleCondition'], 


dtype='obj ect' ) 


columns-^ aerrerr LD^LJi_iLb ct 0 ^ 6 O) 6 ot (J^ 60 )jd ©i_LbGlu^gU6rr6TT§j 

CT6OTU§J aeOOTLjSllLIUUL© ^ 60)611 ^([5 list-^-S LD[r^JDLJU(B'^6OTJD6OT. list-6OT 
(j^^6U[r6ii§j LD^ui_i ^6TT6ii ©i_ii)Glu^gU6rr6Tr 

@ 6 ij 6 iin:o' 06 O)^Li!jl 6 OTn:^ ^n:6OT null LD^LJi_ia6rr (glgLJUuu(5l'^6OTJD6OT. 


print (bl) 


['RL', 'RM', 'FV, 'RH', 'C (all)'] 


[ 'Pave', 'Grvl'] 


['Reg', 'IRl', 'IR2', 'IR3'] 


['Y', 'N', 'P'] 

['WD', 'New', 'COD', 'ConLD', 'ConLI' , 'ConLw', 'CWD', '0th', 'Con'] 

['Normal', 'Partial', 'Abnorml', 'Family', 'Alloca', 'AdjLand'] 




Threshold-^ 6i]li_a (g) 60 )JD 6 iin: 6 OT null LD^LJi_ia 60 ) 6 TTLJ Glu^gu6n6TT 
numerical column-^6OT§j ^6 tt6ii ©LibGlu^FuerreTT Lo^uilileOTn:^ 
(glguuuulB'^iDgJ. include=['integerVfloat'] ct6otu§j numerical columns-g<s 
(g)j51a(g)Lb. 


print {a2) 


Id 0 


MSSubClass 0 


LotArea 0 


MoSold 0 

YrSold 0 


SalePrice 0 


dtype: int64 




print {b2) 


Index{[], dtype='object') 

©geoOT© column-^ serrerr LD^LJi_ia 60 ) 6 Tr ^uQl©, ^eoeuaeffleOT 
eiil^^LurrffLb aeoOTLjSliuuuL© @»([5 Mgj column-^a dataframe-^ 
@6OT60OTaauu(5\<^lJD§J. 63 columns-^<a aerrerrgj i_i§j columns © 60 ) 60 ot^^ i!j16ot 65 
CT6OT LDaj51ujl([5LJU60)^a aaeocTeuaii). 

print (df.shape) 

( 1460 , 63 ) 

print (df.shape) 

( 1460 , 65 ) 


S^ 60 ) 6 iiLijl^ 6 ua^ ^([ 5 .#l 6 u column-6OT GluLuijaerr S^gi^Luaaa GlaaQaauuL© ^eroeii 
dataframe-^ @(i5^§J [§«a«aLJu(5l'^6OTJD6OT. i!j16ot 59 ct6ot LDUjSlujli^LJueOT^a aaeocTeuaii). 



print (df.shape) 


(1460, 59) 


numerical columns-<a(g)Lb, target columns-<S(g)LDn:6OT correlation aeoOTLjSliuuuL© 
Gl6ii6frlLJu(5l^^uu(5l'^tD§J. ld^ui_i 0.4 CTgurb threashold-g eSii . (g) 60 )JD 6 iin:a 
^(^ULjleOT ^60)611 dataframe-d)l([5^§j [§aaLJu(5l'^6OTJD6OT. 

print (x) 

MSSubClass 0.084284 

LotFrontage 0.351799 

LotArea 0.263843 


SalePrice 1.000000 

Years Before Sale 0.523350 

Years Since Remod 0.509079 

Name: SalePrice, dtype: float64 


SLD^<gn_jSlLU LDn^JD^aerr i!j16ot, CBLoa^^ 

(j^a.£lLu 6ijl6)^LU[aa6rr dataframe-^ ©eOTguii aerrerr^a ct6otu§j 




Sffn:^aaLJu(5\'^JD§J.'^6TT6iia(g) ^^aiDrieOT ^sfrliJULL LD^ui_ia60)6rra GlarreoOTL 
columns [§aaUU(5l'^6OTJD6OT. ©60)6IILL(Lb [§aaLJUL_l_l!jl6OT COlumnS CT60OT60rffla60)a 38 
CT6OT LDn:j5lLi!jl([5LJU6O)0a an:60OT6un:Lb. 

print (df.shape) 

(1460, 38) 

Ljl6OT6OTij ^60)611 COlumnS CT6OT Gl6II6fflLJU(5l^^UU(5l'^6OTJD6OT. 

print (df.columns) 

Index{['MSZoning', 'LotShape', 'LandContour', 'Utilities', 


LotConfig', 




LandSlope' , ' 

Conditionl', 

'Condition2 ' , 

'BldgType', 'HouseStyle' , 

OverallQual' , 
Exterior2nd' , 

'RoofStyle', 

'RoofMatl', ' 

Exteriorlst' , 

ExterQual' , ' 

ExterCond', ' 

TotalBsmtSF', 

'HeatingQC', 'CentralAir', 

Electrical', 

'IstFlrSF', ' 

GrLivArea', 'FullBath', 'KitchenQual' , 

TotRmsAbvGrd' 
GarageArea', 

, 'Functional 

', 'Fireplaces 

', 'GarageCars' , 

PavedDrive', 

'SalePrice', 

'Years Before 

Sale', 'Years Since 


Remod'], 



dtype='obj ect') 


a 60 )i_.#lLU[ra dataframe-^ LD^ui_ia6TTa6OT§j training_data CTguii 

duLuffl^ .CSV Saauuaa SffL6laaLJu(S\'®6OTJD6OT. ©gjSeu model-6OT 
S6rr6Tfl_aa ^60)LDLL(Lb. ©60)^ 60)6II^§J Hiodel-g a([56Iiaa(g)6II§J CTUUl^ CT6OTgU 
U(g)^LLjl^ aaeoOTeuaii). 



Model file handling 


Model Creation 


sklearn (sk for scikit) &t 6 otu§j python-^ agfrerr ^^5 

library classification, regression ^.^lu gugoaagfiigOT ^ 60 )LDLL(Lb 

linear, ensemble, neural networks SungOTJD ^60)6 ot^§j eijl^LDcrgOT model-<S(g)Lb 
algorithms an: 6 D 0 TLJU(S\Lb. LinearRegression giguii algorithm-^ 

gT(S\^§j [BLb(j^ 60 )i_LU data- 60 ) 6 iiu u^jSI CBULb 

(glg^ i!jl6OT6ii(gLDn:gu- 


https://gist.github.com/nithyadurai87/91e74160ccb4ff51eef3188372a78b91 


import pandas as pd 

from sklearn.linear_model import LinearRegression 

from sklearn.model_selection import train_test_split,cross_val_score 

from sklearn.externals import joblib 

from sklearn.metrics import mean_squared_error 

import matplotlib.pyplot as pit 


from math import sqrt 




import os 


df = pd.read_csv{'./training_data.CSV') 

i = list(df.columns.values) 
i.pop(i.index{'SalePrice')) 
dfO = df[i+['SalePrice'] ] 

df = dfO.select_dtypes{include=['integerfloat'] ) 
print (df.columns) 

X = df[list(df.columns)[:-!]] 
y = df['SalePrice'] 

X_train, X_test, y_train, y_test = train_test_split(X, y) 
regressor = LinearRegression {) 
regressor.fit{X_train, y_train) 


y predictions = regressor.predict(X test) 



meanSquaredError=mean_squared_error{y_test, y_predictions) 
rootMeanSquaredError = sqrt(meanSquaredError) 

print{"Number of predictionslen{y_predictions)) 
print("Mean Squared Error:", meanSquaredError) 
print("Root Mean Squared Error:", rootMeanSquaredError) 
print {"Scoring:",regressor.score{X_test, y_test)) 

pit.plot{y_predictions,y_test,'r.') 
pit.plot{y_predictions,y_predictions, 'k-') 
pit.title{'Parity Plot - Linear Regression') 
pit.show{) 

plot = pit.scatter{y_predictions, {y_predictions - y_test), c='b') 
pit.hlines{y=0, xmin= 100000, xmax=400000) 
pit.title{'Residual Plot - Linear Regression') 


pit.show{) 



j oblib.dump(regressor, 


./salepricemodel.pkl') 


□□□□□□□□□□□□□□□□□□: 


Index{['OverallQual', 'TotalBsmtSF', 'IstFlrSF', 'GrLivArea' 

'FullBath', 

'TotRmsAbvGrd', 'Fireplaces', 'GarageCars', 'GarageArea', 

'Years Before Sale', 'Years Since Remod', 'SalePrice'], 
dtype='obj ect') 

Number of predictions: 365 

Mean Squared Error: 981297922.7884247 

Root Mean Squared Error: 31325.675136993053 


Scoring: 0.818899237738355 



□□□□□□□□□□□□□□□□□□: 


1 . training_data CTguib aerrerr df-<S(g)6rr 

Glff g^^0UUL_(B6iilLI_6OT. 

2 . a6®fTlLJU^^(g) ^ 60 ) 6 OT^§jLb X-g^ii, aeorfflaauuL SeueoOTi^LU 'SalePrice' 

CT6OTU§J y-g^li SffL6laaLJUL_(B6rr6TT§J. (^6OT6OTij pOp() CT6OTU§J 

aeo^frlaauuL SeueoOTi^LU column-g df-dJli^^gj i!jl 6 OT 6 OTir iJ’eoOT^ii <a60)i_.#l 
column-^a © 60 ) 60 OTa.^ljD§j. Qy) 6 ULb [ :-l] CT 6 OTa 

a 60 )i_.#la(g) (j^6OT6OTa^ aerrerr ^eoeOT^gjii X-g^ii a 60 )i_#l column-^6OT 
'SalePrice'-g y-g^ii GlaaerrerreuaLb. 

3. fit() CT6OTU§j Gl<sa(5\LJU^^(g)Lb, predictO ct6otu§j 

ft6®fTlLJU^^(g)m ULU6OTU(S\'^JD§J. 

4. score() CT6OTUgj ^LDgj algorithm CTeiieuerreii gtiiJii <?iJlLuaaa 
a^gU'SGi«aa60(jT(5l6Tr6rr§j ct6otu60)^ ld^uOIlu ulu 6 otu( 51 '^|D§J. 

5. train_test_split() ct6otu§j [BLb(j^ 60 )i_LU ^geiiaeroerr 75% - 25% CTguii) 

i!jli51<a.^JD§j. ^^aeugj 75% ^geiiaerr Gl<aa(S\uu^^(g)Lb, 

25% ^geiiaerr Sot^60)6ot LD^LJi!jl(S\6ii^^(g)Lb ULU6OTu(5lii. 

6. mean_squared_error, sqrt functions, CBiogj algorithm-^^ 

a6®fTlaaUU(5lli> LD^LJI_ia(^a(g)Lb S60(jT60)LDLUn:6OT LD^LJI_ia(^a(g)Lb ^61:611 

^^uflleOT ffgn:ffffl 60 )Lua aeoOTLjgl^gj @Lpui_i ^n:6OT 'Residual 

Error' ^@ 10 . @>([5 eueoguLLona eueog^gj an:L_i_uuL_(5l6Tr6rr§j. 

7 . joblib CT6OTU§j [BLD^ model-g .pkl Sanuuna SffL6la(g)Lb. ©gjSeu pickle file 

serialization de-serialization-a(g) a^eq-^eOTJn ^([5 binary 

Sanuiq eueroa ^(giib. @6O)0 60)6ii^§j CTeiieungu iq^LU ^geiiaeroerr aeKfrluugj 
CT6OT U(g)^Li!jl^ arreoOTeunLb. 



Prediction 


CBLDgj Sarruu!^ aerr^rr ^geiileoeOT LDL_(S\ii ^^piarreOT efleoeueoLU 

aeorfflaaff Glffn^g^Seumi. input.json CTguii) SarrufleOT eii^Siu 
Glan:(B'*'*uu(S\^JD§j. 


cat input.json 


{ 

"OverallQual":[7], 

"TotalBsmtSF": [856], 

"IstFlrSF": [856], 

"GrLivArea": [1710], 

"FullBath":[2], 

"TotRmsAbvGrd":[8], 
"Fireplaces":[0], 
"GarageCars":[2], 
"GarageArea":[548], 
"Years Before Sale":[5], 


Years Since Remod":[5] 


} 


predictO i!jl6OT6ii([5LDn:gu- 


https://gist.github.eom/nithyadurai87/4a31b465220448ab05b84d2227e4e8a5 


import os 
import json 
import pandas as pd 
import numpy 

from sklearn.externals import joblib 

s = pd.read_json{'./input.json') 
p = joblib.load{"./salepricemodel.pkl") 
r = p.predict(s) 


print (str (r)) 



□□□□□□□□□□□□□□□□□□: 


[213357.65598157] 


a 60 OT 60 )LDLun: 6 OT SdlePrice ld^ui_i 208500 CBLOgi 213357 CTguii) 

LD^LJLjl60)6OT GleueffluuQ^gJli. Ug6Iin:Ujl^60)6U. 6jGl6OT«frl^ ^LDg] 

algorithm-6OT score, 81% ct6otS6ii ^6tt6ii eiil^^LurrffLb 

GlffLULLIli). 


1 . joblib.loadO ct6otu§j binary euiyeiil^ aerrerr SaaLJi!jl 60 ) 6 OT de-serialize Glffiugj 
algorithm-^a Loa^jSl SffL6la(g)ii). 

2 . i!jl6OT6OTij l5§j GlffLu^u^LD predict() ^6OT§j json 6iiiy6i]l^ serrerr 

^geiiaeroerr aerreifLaaa Gl6ii6fflLi5L_iy60)6OTa 

a6®frla.£ljD§j.. 


prediction-aaa6OT aerrerf© OeuefilLilL© ld^lji!j 160 ) 6 ot CTeiieuagu 

^([5 Rest API-^a expose Glffiueiigj CT 6 OTgu uairaaeumi. 


Flask API 


CBLDgj algorithm a6®fTl4(g)Lb ld^lji!j 160 ) 6 ot ^([5 API-^a expose GlffLU 6 ii^^(g) Flask 

ULU6OTU(S\^JD§J. l!jl6OT6II([5LDa^. 


https://gist.github.eom/nithyadurai87/9d04097e006e2fe6c7a96blda643cb3a 


import os 
import json 
import pandas as pd 
import numpy 

from flask import Flask, render_template, request, jsonify 
from pandas.io.json import json_normalize 
from sklearn.externals import joblib 


app = Flask{_name_) 


port = int(os.getenv{'PORT', 5500)) 



@app.route { '/') 


def home{): 

return render_template{'index.html') 

@app.route{'/api/salepricemodel', methods=['POST']) 
def salepricemodel{): 

if request.method == 'POST': 
try: 

post_data = request.get_json{) 
j son_data = j son.dumps{post_data) 
s = pd.read_json{json_data) 
p = joblib.load{"./salepricemodel.pkl") 
r = p.predict (s) 
return str{r) 

except Exception as e: 
return (e) 


if 


name 


main 



app.run{host='0.0.0.0', port=port, debug=True) 


□□□□□□□□□□□□□□□□□□ : 


* Serving Flask app "flask_api" (lazy loading) 

* Environment: production 

WARNING: Do not use the development server in a production environment. 
Use a production WSGl server instead. 

* Debug mode: on 

* Restarting with stat 

* Debugger is active! 

* Debugger PIN: 690-746-333 

* Running on http://0.0.0.0:5500/ (Press GTRL+G to quit) 


postman CTguib Qy)6ULb CBmi Glan:6rr6rT6un:Lb. 



Model comparison 


[BLDgj model 2 ([ 56 iin:aa^^^(g) Gleuguii linear regression-^ LDL_(S\Lb 
uiiJ6OTu(5l^^aLD^, Seugu .#l6u algorithm-eiiLguii ^lji!j1l_(S\ &t§j ^ 60 )^ 

uiLieOTU©^^ S 6 ii 60 OT(S\Lb. flleOTeui^Longu- tBiogj ^geiiaeogn 

u^Seugu algorithm-^ Glurii^^^, ^eiiGleuneOTjSlgueoLLU Score RMSE 

LD^Lji_ia60)6TT Gl6ii6filuu(5l^§J'^iD§J. ^Ijd^^ 60)^ ^[rii S^ijeii Os^LUgJ 

GiftnerrerreunLb. 


https://gist.github.eom/nithyadurai87/9ecfcbf04593d245e26316d52b0708el 


import pandas as pd 


from sklearn.linear_model import LinearRegression, Ridge, Lasso, 
ElasticNet 


from sklearn.ensemble import RandomForestRegressor, 
AdaBoostRegressor, ExtraTreesRegressor, GradientBoostingRegressor 


from sklearn.tree import DecisionTreeRegressor 
from sklearn.neural_network import MLPRegressor 

from sklearn.model_selection import train_test_split,cross_val_score 


from sklearn.externals import joblib 



from sklearn.metrics import mean_squared_error 

from azure.storage.blob import BlockBlobService 

import matplotlib.pyplot as pit 

from math import sqrt 

import numpy as np 

import os 

df = pd.read_csv{'./training_data.CSV') 

i = list(df.columns.values) 

i.pop(i.index{'SalePrice')) 
dfO = df[i+['SalePrice'] ] 

df = dfO.select_dtypes{include=['integerfloat']) 

X = df[list(df.columns) [:-!]] 

y = df['SalePrice ' ] 

X_train, X_test, y_train, y_test = train_test_split(X, y) 


def linear{): 



regressor = LinearRegression{) 

regressor.fit{X_train, y_train) 

y_predictions = regressor.predict{X_test) 

return (regressor.score{X_test, 
y_test),sqrt{mean_squared_error{y_test, y_predictions))) 


def ridge{): 

regressor = Ridge{alpha=.3, normalize=True) 

regressor.fit{X_train, y_train) 

y_predictions = regressor.predict{X_test) 

return (regressor.score{X_test, 
y_test),sqrt(mean_squared_error(y_test, y_predictions))) 

def lasso(): 

regressor = Lasso(alpha=0.00009, normalize=True) 

regressor.fit(X_train, y_train) 

y_predictions = regressor.predict(X_test) 

return (regressor.score(X_test, 
y test),sqrt(mean squared error(y test, y predictions))) 



def elasticnet{) : 


regressor = ElasticNet {alpha=l, ll_ratio=0.5, normalize=False) 
regressor.fit{X_train, y_train) 
y_predictions = regressor.predict{X_test) 


return (regressor.score{X_test, 
y_test),sqrt{mean_squared_error{y_test, y_predictions))) 


def randomforest{): 


regressor = 

RandomForestRegressor{n_estimators=15,min_samples_split=15,criterion 
regressor.fit{X_train, y_train) 
y_predictions = regressor.predict{X_test) 


print{"Selected Features for 
RamdomForest",regressor.feature_importances_) 


return (regressor.score{X_test, 
y_test),sqrt(mean_squared_error(y_test, y_predictions))) 


def perceptronO : 


regressor = MLPRegressor(hidden_layer_sizes=(5000,), 
activation='relu', solver='adam', max_iter=1000) 


regressor.fit(X_train, y_train) 


y predictions = regressor.predict(X test) 



print {"Co-efficients of Perceptron",regressor.coefs_) 


return (regressor.score{X_test, 
y_test),sqrt{mean_squared_error{y_test, y_predictions))) 


def decisiontree{): 


regressor = 

DecisionTreeRegressor{min_samples_split=30,max_depth=None) 
regressor.fit{X_train, y_train) 
y_predictions = regressor.predict{X_test) 


print{"Selected Features for 
DecisionTrees",regressor.feature_importances_) 


return (regressor.score{X_test, 
y_test),sqrt(mean_squared_error(y_test, y_predictions))) 


def adaboost(): 


regressor = AdaBoostRegressor(random_state=8, 
loss='exponential').fit(X_train, y_train) 


regressor.fit(X_train, y_train) 


y_predictions = regressor.predict(X_test) 


print("Selected Features for 
Adaboost",regressor.feature_importances_) 

return (regressor.score(X_test, 
y test),sqrt(mean squared error(y test, y predictions))) 



def extratrees {) : 


regressor = ExtraTreesRegressor{n_estimators=50).fit{X_train, 
y_train) 

regressor.fit{X_train, y_train) 
y_predictions = regressor.predict{X_test) 


print{"Selected Features for 
Extratrees",regressor.feature_importances_) 


return (regressor.score{X_test, 
y_test),sqrt{mean_squared_error{y_test, y_predictions))) 


def gradientboosting{): 


regressor = GradientBoostingRegressor{loss='Is',n_estimators=500, 
min_samples_split=15).fit{X_train, y_train) 


regressor.fit{X_train, y_train) 


y_predictions = regressor.predict{X_test) 


print{"Selected Features for 
Gradientboosting",regressor.feature_importances_) 


return (regressor.score{X_test, 
y_test),sqrt(mean_squared_error(y_test, y_predictions))) 


print ("Score, RMSE values") 



print ("Linear = ", linear{)) 
print ("Ridge = ", ridge()) 
print ("Lasso = ", lasso()) 
print ("ElasticNet = ",elasticnet()) 
print ("RandomForest = ",randomforest()) 
print ("Perceptron = ",perceptron()) 
print ("DecisionTree = ",decisiontree()) 
print ("AdaBoost = ",adaboost()) 
print ("ExtraTrees = ", extratrees()) 


print ("GradientBoosting = ", gradientboosting()) 



□ □□□□□□□□□ □□□□□□□□■: 


Score, RMSE values 

Linear = (0.7437086925668539, 40067.32048747698) 

Ridge = (0.7426559924644496, 40149.523137601194) 

Lasso = (0.7437086997392647, 40067.31992682729) 

ElasticNet = (0.7427716507607811, 40140.499909601196) 
RandomEorest = (0.7816174352942802, 36985.57224959144) 
Perceptron = (0.7090884723574984, 42687.80529374248) 
DecisionTree = (0.7205230305007451, 41840.45264436496) 
AdaBoost = (0.7405881117926998, 40310.51057481991) 
ExtraTrees = (0.8112271823246542, 34386.90514804029) 
GradientBoosting = (0.770865727419495, 37885.095662535474) 


Selected Features for RamdomForest [0.61070268 0.04279095 0.04336447 

0.17066371 0.01107406 0.01329107 

0.0065515 0.03938371 0.02458596 0.02051551 0.01707638] 


Selected Features for DecisionTrees [0.75618387 0.03596786 0.02304119 
0.13037245 0.0022674 0. 0.00739768 0.01056845 0.01184136 0.01171254 
0.01064719] 


Selected Features for Adaboost [0.38413232 0.18988447 0.03844386 
0.12826885 0.03857277 0.03995005 

0.01059839 0.08066205 0.05036717 0.01473333 0.02438674] 


Selected Features for Extratrees [0.33168574 0.04675749 0.05913052 
0.11159271 0.05178125 0.02947481 



0.03966461 0.16786223 0.06241882 0.05316226 0.04646956] 


Selected Features for Gradientboosting [0.04426232 0.16359645 0.14768597 
0.25403034 0.02119119 0.04361512 

0.01825781 0.01626673 0.15891844 0.07188963 0.06028599] 


Co-efficients of Perceptron [array([[ 2.83519650e-01, 7.33024272e-03, 
2.80373628e-01, ..., -1.43939606e-03, -3.84913926e-02], 

[ 1.34495184e-01, 1.31687141e-02, 1.72078666e-04, ...,1.70666499e-23, 
-2.31494718e-02, -1.08758545e-02], 

[ 9.44490485e-02, -2.34835375e-02, 2.37798999e-02, ..., -1.74549692e-02, 
-2.70192753e-02, -3.67706290e-02], 

• • 

[ 1.59527225e-01, -3.19744701e-02, -1.22884400e-01, ..., -2.35994429e-26, 
-3.03880584e-02, -2.85251050e-02], 

[-3.63149939e-01, -4.05674884e-02, 2.66679331e-01, ..., -1.73628910e-02, 
7.40224353e-03, -6.89871249e-03], 

[-4.30743882e-01, 7.07948777e-03, 3.34518179e-01, ..., -1.74075111e-02, 
3.47755293e-02, -2.64627071e-02]]), 


array([[ 0.16789784],[-0.01864141],[ 0.20432696],...,[ 0.01739125], 
[-0.02779454],[-0.00476935]])] 


^([5 model-g ©gjSeu CTsyra <gn_gu 6 ii^^(g), ^^gu 6 roi_LU Score 

RMSE LD^ui_i Threshold Limit, Sensitivity Sun: 6 OTJD 6 ii^ 60 )JDLL(Lb CBmi 

Glanerrerr S6ii60OT(5iii. @60 )^lj u^jSlLL(Lb SmSeu (g)j51ui!jlL_(5i6rr6rr 
algorithm-gu u^j^liLiLb OleOTeOTir CBnii efSlenaaLDnaa aneocTeunii). SmSeu 
(g)j51uLjlL_(5i6rr6TT algorithms-^ features-g 

aeKfii^gjerrerrgj ct 6 otu 60 )^ GleueffluuQ^^LLierrerrgj. ^6OTn:^ u 60 oti_i linear, 

ridge, lasso, elasticnet Gun:6OTJD6ii^j5l^(g)a -^leoLLungj. ©gjGuneOTJD 

algorithms-<a(g) RFE technique Qy)6ULb CBmi features-g Glffiugj ^guuu 
G6ii60(jT(5iii>. @ 60 )^u u^jSI 'feature selection' CTguii aneocreuaLD. 



Improving Model score 


CBmi S([56iin:4^LU model-6OT score-^6OT§j iSlaeiiLb (g) 60 )JD 6 iin:a 

S6iigULi(B'®iD§J aeoOTLjSliLi trend / parity 
SurreOTJD 6ii60)gui_[5ja60)6rTLJ Suctl^u urrijaa S6ii60OT(S\ii. «^L^«sa60OTi_ 

^([5 6 )Jl_i^ 6 ot 6iil60)6U60)LU (£lij60OTLijluu^^an:6OT u^Seu^ 

^^6OTI^LJU60)I_L1j 1^ (glijeOCTLLjlaaUULL 6ijl^U60)6OT eiSleoeuai^LD ULLjl^.#la(g)4 

Glan:( 5 t'*'*uuL_( 5 t 6 rr 6 rT 6 OT. ©60)^ eoeu^gj CBmi ^(^eiirra^LU model- 6 OT score ^6ot§j 
35 &T6OT 6II^§J6rr6rT§J. CT6 otS 6II a60OT60)LDLUn:6OT 6iil60)6ULLlLb, 

a 6 ®fTlaauu(S\ii eiileoeuiLiLb SeugULiQ'^tDgJ aeoOTLjSliLi trend, parity 

plots 6II60)gLUUUL_(5l6n6rT6OT. 


□□□□□□□□□□□□□□□□□□□□□□□□: 

https://gist.github.eom/nithyadurai87/ca54a4a8f59187cb988b5145d000c70c 

import pandas as pd 

from sklearn.linear_model import LinearRegression 

from sklearn.model_selection import train_test_split,cross_val_score 

from sklearn.externals import joblib 

from sklearn.metrics import mean_squared_error 


import matplotlib.pyplot as pit 




from math import sqrt 


import os 

df = pd.read_csv{'./training_data.CSV') 

X = df[list(df.columns)[:-!]] 
y = df['SalePrice ' ] 

X_train, X_test, y_train, y_test = train_test_split(X, y) 
regressor = LinearRegression {) 
regressor.fit{X_train, y_train) 

y_predictions = regressor.predict{X_test) 

meanSquaredError=mean_squared_error{y_test, y_predictions 
rootMeanSquaredError = sqrt(meanSquaredError) 

print{"Number of predictionslen{y_predictions)) 
print("Mean Squared Error:", meanSquaredError) 



print("Root Mean Squared Error:", rootMeanSquaredError) 
print {"Scoringregressor.score{X_test, y_test)) 

## TREND PLOT 

y_test25 = y_test[:35] 

y_predictions25 = y_predictions[:35] 

myrange = [i for i in range(1,36)] 

fig = plt.figureO 

ax = fig.add_subplot{111) 

ax.grid{) 

pit.plot(myrange,y_test25, marker='o') 

pit.plot(myrange,y_predictions25, marker='o') 

pit.title('Trend between Actual and Predicted - 35 samples') 

ax.set_xlabel("No. of Data Points") 

ax.set_ylabel("Values- SalePrice") 

pit.legend(['Actual points','Predicted values']) 

pit.savefig('TrendActualvsPredicted.png',dpi=100) 


pit.show () 



## PARITY PLOT 


y_testp = y_test[:]+50000 
y_testm = y_test[:]-50000 
fig = plt.figureO 
ax = fig.add_subplot{111) 
ax.grid{) 

pit.plot{y_test,y_predictions,'r.') 

pit.plot{y_test,y_test,'k-color = 'green') 

pit.plot{y_test,y_testp,color = 'blue') 

pit.plot{y_test,y_testm,color = 'blue') 

pit.title{'Parity Plot') 

ax.set_xlabel{"Actual Values") 

ax.set_ylabel{"Predicted Values") 

pit.legend{['Actual vs Predicted points','Actual value 
line','Threshold of 50000']) 


pit.show{) 



## Data Distribution 


fig = plt.figureO 

plt.plot{[i for i in range{1,1461)],y,'r. 
pit.title{'Data Distribution') 
pit.show{) 

a, b = 0 , 0 
for i in range (0, 1460) : 
if{y[i]>250000) : 

a += 1 

else: 

b +=1 

print(a, b) 

#X = X[:600] 


#y = y[:600] 



Trend plot ct 6 otu§j 26DOT60)LDLun:6OT eiSleoeuai^ii) model-<s 6 ®ffl^^ eiSleoeuai^ib 
^6n:6ii.S(g) 6iil^^Lun:ffuu(S\.^6OTJD6OT CT6(jru60)^a an:L_(S\.^JD§j. 


Parity plot ct6otu§j ^^5 threshold-^ ^eoLD-s-^ljDgj. 

^^aeugj 6i]l60)6u S6iiguum_n:6OT§j SO^ujlgii eueog (j^eOTguii OleOTguii 
Gtff^eueumi ct6ot.s Oan:(5l^§J threshold-<S(g)6rr CTeiieuerreii eiileoeuaerr 

^60)LD^§J6rr6TT6OT, SlD^ &T6ij6II6TT6Il ^60)LD^gJ6rr6n:gJ &T6(jrU60)ga aaL_(S\.^JDgJ. 


^(Slggui^Lurra data distribution chart eueogLuuuLiJlgrrgrrgj. X- 

uiijl^^lag) ^6frlaaLJUL_(S\6rr6rr 1460 rows-ii, 6 iil^u 60 ) 6 OT 

6 i]l 60 ) 6 ua(g 5 Lb 6 ii 60 )gui_LDn:a 6 ii 60 )g^gj an:L_i_LJUL_(5l6rr6n:6OT. 600 records- 

6ii60)g 6iil^u60)6OT 6iil60)6ua6rr 1 5 euLs^ii 6ii60)g ugeueunau 

ug6iilLL(6rr6TTg60)g)4 aneocTeunib. Sld^ 600-^(g^gj 1000 records-eueog 

6iil^u60)6OT eiileoeuagh ^eoeOTggjii Gleuguii 2 euLs^g^SeuSiu 
^ 60 )LD^^(guu 60 )g.s an:6DCT6un:Lb. ©gjSeu model-6OT (geoJD^g ^eneii score-agi-s 
angeocTib. ^6frl.saLJU(S\Lb ggeiiagrmeOTgj #gn:6OT (j^60)jdlij 1^ ugeueuna 

^60)LD^^(g.Sa S6II6D(jT(S\Lb &T6OT 6J^a6OTS6II aeCOTSLaii. ©tag) ^eiieungu ©^60)6U. 


CT6OTS6II CT§j6ii60)g ugeiilLLigfrerrS^n: ^gjeueog LDL_(S\Lb agfrerr ^ggiiagogrr-s 

Glan:(B^§J model-g ai^eurraigiLbSurrgj ^^6(jr score 
angoOTgumi. X = df[list(df.columns)[:-l]], y = df['SalePrice'] gig^a 
ulgOTgOTij, X = X[:600], y = y[:600] giguii giiijlagogrr ©gogocr^^n:^ 
SuagjLDrrgOTgj. 600 records gugwg LDL_(S\ii sgrrgrr ^ggiiagogrr gT(B^§J fBLDgJ 
model a([5giin:aauu(S\ii. 


Output: 

Number of predictions: 365 

Mean Squared Error: 2312162517.277571 

Root Mean Squared Error: 48084.95104788578 

Scoring: 0.34729555622354125 

97 1363 


agOL^liLirra Glan:(S\'*'*LJUL_(S\grrgrT 1460 ^ggiiagffl^ 250000-<a(g)Lb 

Sld^ gTgiigugTTgii LD^uiqagrr agrrgrrgOT, gigiigugirgii iD^uiqagrr agrrgrrgOT 


CT6OTU§j a60OT(S\fl^!-'S<auuL_(B6rr6TT§j. 1363 LD^ui_ia6rr 250000-<S(g) <^(i^Lb, 
Gleuguii 97 LD^ui_ia6rr Sld^ld ^60)LD^gj6rr6TT6OT. ct6otS6ii ©gjeiiLD #gn:a 

@^ 60 ) 6 u. ©gjSeu outliers ct 6 otlju( 51 '^|D§J. @gjSun:6OTJD outliers-g CTeiieuugu 
[§«S(g)6iigj CT6OT U(g)^Li!jl^ aueoOTeuuLb. 



Feature Selection 


^([5 SftrruLjlguerr u^Seugu columns ^eu^guerr column 

LD^LJI_ia60)6TTLJ GlUUgU^gJ CBmi) a 6 Drffl.S.^ 6 OTJD 6 ijl 6 )^LULb ^60)LD.^JD§J CT 6 OT.S 

ff> 6 D(jT(S\i!jli^uuS 0 feature selection 400, 500 columns-ga 

Gtfta 6 DOT( 5 l 6 rr 6 n: SaaLJi!jld 3 l(_[ 5 ^§j, prediction-.S(g) a^eiiii ^(^.^leu columns- 

S^ijeii GlffLueugj feature selection [BLbL 6 li_(j^ 6 rr 6 TT 

columns-g process variables, manipulated variables & disturbance variables 
CTguii) 3 6 ii 60 )aLijl 6 OT S6ii6DOT(5lii. manipulated disturbance 

@g6DOT(5lii input-<aaa6OT parameter-^aeiiii, process ct6otu§j output-<saa6OT 
parameter-^aeiiii ^60 )ld.^jd§j. 

• Manipulated Variables (MV) - ©eiieueoaiijleOT ^ 60 )LDLL(Lb columns-^ 

aerr^rr LD^LJi_ia60)6rT [BiiLDa^ ldu^jSI (j^i^LLiii. CBLDaSa^Jiieiiagu 

@^60)6OT [Baii) eoaiLiagrreuaLb. 

• Disturbance Variables (DV) - © 06 O) 6 ot CBiiLDa^ ScBiji^-iLiaa ldu^jSI 

(j^i^Luagj. ^6OTa^ manipulated-6OT ld^lji!j 160 ) 6 otlj ld^lji_i 

^ewLD^JDgj. 

• Process Variales (PV) - u^Seuau GlffLU^(^ 60 )JDa 60 ) 6 rTU Gluaau^gj 

LD^LJI_ia6rr ^60)LDlL(Lb. LD^JD columns LD^UI_ia60)6TTU Ouaau^S^ 

LD^UI_1 ^60)LD.^IjD§J. CT6OT(S6II CBaLD LDa^aU6II^^(g) CTgJLD 

.£l 60 )l_LUa§J . 


SLD^a 6 DOTi_ 6 iiaau i!j 16 ot ^eiiOeuai^^ variable-acgiib ld^jd variables- 6 iii_ 6 OT 

@^5a(g)U) Gt^ai_ijijl60)6OT.s aeocr-s-^L^ S 6 ii 6 DOT( 5 lii. ©gjSeu correlation CT 6 OTLJU(S\Lb. 

LD^UI_l -1 +1 6II60)g ^60)LDlL(Lb. -1 CT6OTU§J CT^ijLD60)JD^ 

Gt^ai_ij60)uiL(Lb, +1 S[b*[JLD60)jd^ Gl^ai_ij60)ULL(Lb (g)j51.S(g)Lb. 



a0n:[j6OOT0§ja(g) "aeoOTgCTiLD aeocreiileOT ^6 tt6ii", "ai_^ULi!jl^#l GlffiuiLiLb S^gii", "ct60)i_ 
(g)60)JDLJi_i i_i^0a[aa6O)6TTLJ ui^4(g)Lb S^gii" Sun:6OTJD #l6u U6U features-g 
6 O) 6 ii 0 §j, "ai_d)l 6 OT CT60)i_" CTguLb ^([5 6i]l6)^LU06O)^ ^aii a6®fTl4au Sua 6 ii 0 aaa 
daaeoOTLa^ correlation matrix^ aerrerr LD^LJi_ia6rr illleOTeiii^LDagu 

^60)LDLL(Lb. 


• Positive Correlation: SLdileOT CT60)i_a(g)Lb - seoOT^ib seocreiileOT ^6TT6iia(g)LDn:6OT 
Gl0n:i_rri_i +1 ct6ot OeuefrlLJuiBii. aeocreiileOT ^6 tt6ii ct60)i_ 


• Negative Correlation: SLdileOT CT6roi_a(g)Lb - GlffiuiLiLb 

S[B[J0^;DD(g)LDa6OT Gl^n:i_iji_i -1 ct6ot GleueffiuuQii). GlffiuiLiLb ScBijii) 

SI_d)l 6 OT CT60)I_ (g)60)JDLL(Lb. 


• Zero Correlation: ct60)i_ (g)60)JDLji_i u^jSIlu i_i^^a[aa60)6TTLj ui^a(g)Lb S[Brr^§ji_6OT 
Glan:60OT(5l6rr6TT Gl^n:i_iji_i 0 ct 6 ot Gl6ii6fflLJu(5lii. S[Brr^^^«S(g)Lb si_d}l6OT 

CT60)i_a(g)Lb Lun:Gl^[r([5 ffLbLD^^(^Lb ©^60)6U 


• @60)6iiLU^6un:^ Seugu ^leu features ©(^uflleOT ^60)6ii Glan:60OT(S\6Tr6Tr 
Gl^n:i_rrLjl60)6OTLJ Glungu^gJ, ld^ui_i -1 1 euerog ^60)LDLL(Lb. 



Highly Correlated features (MV - DV) 


data.csv CTguLb Saaufll^igigrr aerrerr columns-^ ct§j 
ct6otG16OT6OT6ot 6 ii 60 )aLua 6 OT parameters CTguib eiileug^eo)^ CBaib domain expert- 
6OT GlaaeoOT© OaagfrerreuaLb. 2^ag6DOT^§ja(g) A Z eueog 

OuiLiijagrr OaaeoOTL 26 features-^ A,B,C,D,E,F ^-^Lueoeii process parameters 
^aeiiii, LD^JD 60 ) 6 ii manipulated disturbance parameters 

ag 5 ^LL( 6 rTS 6 rmLb. ct6otS6ii process parameters ^eoeOT^gjii dataframe- 

djlg 5 ^§j [faaLJU(S\.^6OTJD6OT. i!jl6OT6OTij iJ’^LLierrerr manipulated disturbance 

parameters-<saa6OT correlation a 60 OT( 5 lQli 9 .<sauuL_( 5 l, S<saui_i eui^eijlg^Lb, 
6 ii 60 )gui_ 6iii^6iilg^Lb Gl6ii6filuu(5l^^uuL_(5l6rr6n:§j. ^grreii 

S[BrjLD60)JD CT^ijLD60)JD^ GlaaeocT^engrreoeii dataframe-d3lg5^§j 

[faftLJu(5l'^6(jrJD6OT. ^^aeugj -98,-99,-1,98,99,1 CTguLb Gl 0 m_iji!jl 6 O) 6 OTLJ 
Gtu^jSlg 5 a(g)U) @([5 features-^ [§.saLJU(S\.^JD§j. ©eueuajiiaa manipulated 

disturbance-<s.^60)i_Lijl^ Oaa60OT(S\6fr6TT ^ibs^Kjagrr 

aeocTiSlflli^aaLJUL© ^eu^jSl^ [§.saLJU(S\.^JD§j. L5^(j^6rr6n: ^eoeOT^gjii 

training_data CTguLb OuLuffl^ Ss^L6laaLJU(S\.^JD§j. ©gjSeu [bld§j process variable- 
a(g)U), S^ij^Gt^(5l'SaLJULi_ manipulated & disturbance variable-<S(g)LDa6OT 
Gt^m_ijLjl60)6OTffii a6oAi_jSl6ii^^(g) 26rr6Tfi_aa ^60 )ld.^jd§j. ©eoeuagrr ^eoeOT^gjii 
CBaii aeoAlaa SeueoAi^LU process variable-eiiLeA Oaa6oA(5l6rr6TT O^m_iji!jl 60 ) 6 OTa 
aeoAiSlflli^^gj, 0 Gl^m_iji_i Ou^guArerr columns-g [fa(g)6ii§j 

UlyLUaiS t5-|60)U)i^JT)^l. 


https://gist.github.eom/nithyadurai87/5a43155d33cf5288204def23661704d0 



import pandas as pd 


import matplotlib.pyplot as pit 
import numpy 

from sklearn.linear_model import LinearRegression 

from sklearn.model_selection import train_test_split,cross_val_score 
from sklearn.metrics import mean_squared_error 
from math import sqrt 

from sklearn.feature_selection import RFE 
from sklearn.datasets import make_friedmanl 

df = pd.read_csv{'./data.CSV') 

# Dropping all process parameters 

df = df.drop{["A","B", "C", "D", "E", "F"], axis=l) 

#finding correlation between manipulated & disturbance variables 
correlations = df.corr{) 


correlations = correlations.round (2) 



correlations.to_csv{'MV_DV_correlation.csv',index=False) 
fig = plt.figureO 
g = fig.add_subplot{111) 

cax = g.matshow(correlations, vmin=-l, vmax=l) 

fig.colorbar(cax) 

ticks = numpy.arange{0,20,1) 

g.set_xticks(ticks) 

g.set_yticks(ticks) 

g.set_xticklabels(list(df.columns)) 
g.set_yticklabels(list(df.columns)) 
pit.savefig('MV_DV_correlation.png') 

#removing parameters with high correlation 

upper = 

correlations.where(numpy.triu(numpy.ones(correlations.shape), 
k=l).astype(numpy.bool)) 

cols_to_drop = [] 

for i in upper.columns: 

if (any(upper[i] == -1) or any(upper[i] == -0.98) or any(upper[i] 
== -0.99) or any(upper[i] == 0.98) or any(upper[i] == 0.99) or 
any(upper[i] == 1)): 



cols_to_drop.append(i) 
df = df.drop{cols_to_drop, axis=l) 

print (df.shape,df.columns) 

df.to_csv{'./training_data.csv',index=False) 


□□□□□□□□□□□□□□□□□□ : 


(20, 17) Index(['G', 'H', 'J', 'K', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 
'Z'], dtype='object') _ 











Zero Correlated features (PV - MV,DV) 


"A" CT 6 OTU§j ^[rii aeorfrlaa SeueoOTi^LU process parameter CTsyra GlaaerrSeuaLD. 
training_data CTguLb SaaLJi!jl^(g) 6 rr, "A" -eroeu a60)i_#l column-^a 
@ 60 ) 60 OT^§j (£lgg^.S(g) aerrerfLaa ^guuueiiii. flleAeOTij A-a^ii ld^jd 

parameters-<S(g)LDa 6 OT Gl^ai_iji!jl 60 ) 6 OTa a 6 DOT( 5 tfll^!-^§J, 0 0 ^ai_iji_i 

Gtfta60OT(5t6rr6n: MV, DV - 60 )lu (£l.s.^6ijli_6iiLb. @[a(g) 0.6 -.S(g)Lb (g) 60 )JD 6 iia 6 OT 
0.1, 0.2, 0.3, 0.4, 0.5 CTguii) LD^LJi_ia60)6n:u Glu^gueaerr columns 
[faftLJu(5l'^6(jrJD6OT. 


https://gist.github.eom/mthyadurai87/e0cca6ec864405a032888244122a90d8 


import pandas as pd 

import matplotlib.pyplot as pit 

import numpy 

from sklearn.linear_model import LinearRegression 

from sklearn.model_selection import train_test_split,cross_val_score 
from sklearn.metrics import mean_squared_error 
from math import sqrt 

from sklearn.feature_selection import RFE 
from sklearn.datasets import make_friedmanl 



df = pd.read_csv{'./training_data.CSV') 
print (df.shape,df.columns) 

# Dropping columns which has correlation with target less than 
threshold 

target = "A" 

correlations = df.corr{)[target].abs{) 
correlations = correlations.round(2) 

correlations.to_csv{'./PV_MVDV_correlation.csv',index=False) 
df=df.drop(correlations[correlations<0.06].index, axis=l) 

print (df.shape,df.columns) 

df.to_csv{'./training.csv',index=False) 


□□□□□□□□□□□□□□□□□□ : 


(20, 18) Index(['G', 'H', 'J', 'K', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V, 'W, 
'A'], dtype='object') 

(20, 17) Index(['G', 'H', 'J', 'K', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V, 'W, 
'A'], dtype='object') 


'X', 'Y', 


J 


'X', 'Y', 



Recursive Feature Elimination Technique 


RFE technique CT 6 OTgu ^60)^aaLJu(5iii. Randomforest, Decisiontree, 
Adaboost, Extratrees, gradient boosting SuneOTJD algorithms features- 

g Gls^LuiLiLb ^jd 6 ot Glu^gu eiilenaigiLb. ^syrn:^, linear regression, 

ridge, lasso, elasticnet SuneOTJD algorithms-<S(g) techniques Qy) 6 ULb [Baib 

^n:6OT features-g S^ireq Gls^Lugj S 6 ii 60 OT(S\ii). gjLULoneOTgj ^([5 

algorithm-^ aerrerfLcrau Glu^gu® Gi-sneocT©, ^eiiOeun:!^ feature-<S(g)Lb ranking-g 
6 ii^[a(g).^JD§j.. rank 1 Glu^gusagn feature-g LDL_( 5 lii G^ireq Gi<?LLj§j cbuld 
uiiJ6OTu(5l^^6un:Lb. 


https://gist.github.eom/nithyadurai87/34ca5b0e8a9f5908276240eb099247ad 


import pandas as pd 

import matplotlib.pyplot as pit 

import numpy 

from sklearn.linear_model import LinearRegression 
from sklearn.tree import DecisionTreeRegressor 

from sklearn.model_selection import train_test_split,cross_val_score 
from sklearn.metrics import mean_squared_error 


from math import sqrt 



from sklearn.feature_selection import RFE 
from sklearn.datasets import make_friedmanl 


df = pd.read_csv{'./training.CSV') 

X = df[list(df.columns) [:-!]] 

y = df['A'] 

X_train, X_test, y_train, y_test = train_test_split(X, y) 

regressor = 

DecisionTreeRegressor{min_samples_split=3,max_depth=None) 

regressor.fit{X_train, y_train) 

y_predictions = regressor.predict{X_test) 

print ("Selected Features for 

DecisionTree",regressor.feature_importances_) 


# RFE Technique - Recursive Feature Elimination 

X, y = make_friedmanl{n_samples=20, n_features=17, random_state=0) 
selector = RFE(LinearRegression{)) 


selector = selector.fit (X, y) 



print ("Selected Features for LinearRegression",selector.ranking_) 


feature_importances_ CTgutb method, decisiontree-6OT iJ’gj Glffiu^uL©, 
^ 60 ) 6 OT^§j features-<S(g)LDa6OT ranking-g GleueffluuQ^^LtierreTTero^a aaeocTeuatb. 

method, linear regression iJ’gj Glffiu^uLngj. ct6otS6ii RFE Qy)6ULb 
CBaLD^neOT ranking-g Gl6ii6fiiuu(5i^§Ji-Dagu Off lulu S6ii60OT(5iii>. OleOTeOTij 
Rank 1 Gl6ii6f[iuuL_(S\6rr6TT features-g LDL_(S\ii S^rjeii Glffiugj ULueOTuQ^^euLni. 


□□□□□□□□□□□□□□□□□□ : 


Selected Features for DecisionTree [9.52359304e-04 O.OOOOOOOOe+00 
O.OOOOOOOOe+00 

O.OOOOOOOOe+00 O.OOOOOOOOe+00 6.15147906e-03 2.23327627e-03 
7.70622020e-02 

O.OOOOOOOOe+00 O.OOOOOOOOe+00 1.10263284e-03 2.33946020e-04 
O.OOOOOOOOe+00 O.OOOOOOOOe+00 9.12264104e-01 O.OOOOOOOOe+00] 
Selected Features for LinearRegression [11 10 1198352671 
114 1] 


Outliers Removal 


Outlier ct6otu§j ld^jd SeiigyuL© ^efrefrl ©(^.sigiLb ^rreii 

^(g)ii. 5,10,15,20...75 CTguLb ld^lji!j160)6ot.s OaueDOTi^i^-sigiLb ^rreii euffleos^aeffl^ 
^6 otSjd ^6OTgu LDL_(5lii 15676 CTguLb CT 60 OT 60 ) 6 D 0 Ta Glff>n: 60 OTi^(_[ 5 ui!jl 6 OT, ^gjSeu outlier 
@60)^^ ^n:6OT [Bmi aeoOTLjSl^gj <s60)6mLi S6ii60OT(S\Lb. 


.^L^aaeoOTL 26 rr 6 ifi_n:a aerr^rr San:LJi!jl^(g)6rr outliers 

^eiiOeun:!^ column-g^LD aeoOTLjSliLiuuL© ^60)6ii ^([5 eueoguuLLDrra 
Gl6II6fflLJU(S\^^LJU(S\.^l6OTJD6OT. boxplot ^^6U§J violluplot 
ULU 6 OTU(S\^ 6 OTJD 6 OT. 


https://gist.github.eom/nithyadurai87/1756b2a5ec421fc3f36add04909cc517 

import pandas as pd 

import pylab 

import numpy as np 

from scipy import stats 

from scipy.stats import kurtosis 

from scipy.stats import skew 


import matplotlib. pylab helpers 




df = pd.read_csv{'./14_input_data.CSV') 

# Finding outlier in data 
for i in range(len(df.columns)): 
pylab.figure{) 

pylab.boxplot(df[df.columns[i]]) 

#pylab.violinplot(df[df.columns[i]] ) 
pylab.title{df[df.columns[i]].name) 

listl=[] 

for i in matplotlib._pylab_helpers.Gcf.get_all_fig_managers{) 
listl.append(i.canvas.figure) 
print (listl) 

for i, j in enumerate (listl) : 


j.savefig(df[df.columns[i]].name) 



# Removing outliers 


z = np.abs (stats.zscore(df)) 
print {z) 

print(np.where {z > 3)) 

print{z[53] [ 9]) 

dfl = df[{z < 3).all{axis=l)] 

print (df.shape) 
print (dfl.shape) 


listl CT6OTU^^(g)6rr column-a(g)LDn:6OT eueoijuLaaerr 

Sff L6laaLJUL_(5\'^6OTJD6OT. 


print(listl) 


[<Figure size 640x480 with 1 Axes>, <Figure size 640x480 with 1 Axes>, 
<Figure s 

ize 640x480 with 1 Axes>, <Figure size 640x480 with 1 Axes>, <Figure size 
640x48 

0 with 1 Axes>, <Figure size 640x480 with 1 Axes>, <Figure size 640x480 with 
1 A 

xes>, <Figure size 640x480 with 1 Axes>, <Figure size 640x480 with 1 Axes>, 
<Fig 



ure size 640x480 with 1 Axes>] 


ijl6OT6OTg savefigO Qy)6ULb column-a(g)LDa6OT 6 ii 60 )rrui_(j^Lb 

GiuLufflSeuSiLi SffL6l.sff>LJU(S\.^JD§j. ULcaaefil^ uaaii) ©(^uugj 

'salePrice' -aaaem: violin plot eueugj uaaii) ©(^uugj column-aaneOT 

box plot 6j^a6ii§j ^ 6 OT 60 )jdlj ULU6(jru(S\^^ outlier 

@i_^ 60 )^ [Bmi Giaaerrerreumi). @[a(g) SalePrice-^ 300000-<S(g) GiDg^ii 

100000-<S(g) outlier Gi 6 ii 6 frluu(S\^^u 46 rr 6 a§j. 


^(Si^^ui^Luaa outliers-g CTeiieuaau [fa(g)6ii§j CT6(jrau uaijaaeuaib. Z 

Score, IQR Score GuaeOTJueoeu ©^^aaau ULU 6 (jru(S\'^l 6 OTJD 6 OT. Z Score CT 6 (jru§j ^([5 
^^6l\ ^^^aa 6 OT mean LD^uiildjli^^gj CTeiieu^aeii gtitrii ^eaefil 
CT 6 OTU 60 )^a a 60 OTa.^lL_(S\a s^auu). ^6a6ii ^ehefil ©(^Liueu^eoji) rBaib outlier- 

Giaaea^aeuaib. 


0 CT6(jru^60)6OT mean-^a eoeu^gjaGlaaeocr©, ^eiiOeuai^ ^geiiii 

^6a6iift(g) ^ehefil agaeagj ct6otu§j flleOTeui^maaij. 


print(z) 


[[0.65147924 0.45930254 0.79343379 ... 0.31172464 0.35100032 0.4732471 ] 
[0.07183611 0.46646492 0.25714043 ... 0.31172464 0.06073101 0.01235858] 
[0.65147924 0.31336875 0.62782603 ... 0.31172464 0.63172623 0.74302803] 

[0.65147924 0.21564122 0.06565646 ... 1.02685765 1.03391416 0.23194227] 
[0.79515147 0.04690528 0.21898188 ... 1.02685765 1.09005935 0.23192429] 
[0.79515147 0.45278362 0.2416147 ... 1.02685765 0.9216238 0.2319063 ]] 


Gleuguii SLD^aeoOTL ld^lji!j 160 ) 6 ot LDL_(Bii> 60)6ii^§jaGlan:60(jT(5\, outliers-g Glffrr^d)! 
6iili_ (j^i^Lurrgj. threshold-^ S6ii60OT(Bii. Oun:§j6iin:a 3 

CT6OTUgj threshold-^<a ^ 60 )LDLL(Lb. 3-<a(g)Lb Sld^ gerreffi ©(guueroeu 

CT^eurrii oudiers ^(gii. ct6otS6ii oudiers-g LDL_(Bii print Glffiueug^aneOT 
aL_l_60)6TT l!jl6OT6II(gLDn:gU- 


print(np.where(z > 3)) 


(array([ 53, 58, 112, 118, 151, 161, 166, 178, 178, 185, 185, 

185, 197, 224, 224, 224, 231, 278, 304, 309, 309, 313, 

321, 332, 336, 349, 375, 378, 389, 440, 440, 440, 473, 

477, 481, 496, 496, 496, 496, 515, 523, 523, 523, 527, 

529, 533, 581, 585, 591, 605, 608, 635, 635, 642, 664, 

691, 691, 691, 769, 769, 798, 803, 825, 897, 898, 910, 

1024, 1031, 1044, 1044, 1061, 1169, 1173, 1182, 1182, 1182, 1190, 


1230, 1268, 1298, 1298, 1298, 1298, 1298, 1298, 1350, 1353, 1373, 
1373, 1386], dtype=int64), array([9, 9, 9, 3, 9, 9, 6, 8, 9, 3, 5, 9, 3, 

1, 2, 9, 9, 9, 3, 6, 9, 9, 

9, 1, 9, 9, 0, 9, 9, 1, 2, 9, 9, 9, 9, 1, 2, 3, 9, 9, 1, 2, 3, 9, 

2, 0, 8, 9, 9, 6, 3, 3, 5, 6, 8, 1, 2, 3, 3, 5, 3, 5, 8, 5, 2, 5, 

2, 5, 1, 2, 8, 3, 5, 1, 2, 3, 8, 5, 3, 1, 2, 3, 5, 6, 8, 5, 3, 1, 

2, 5], dtype=int64)) 


GleuefrlLilLi^^ ©ijeoOT© arrays() aerrerTgeroga aeusfrlaaeiiLb. 
array()-^ outlier ^eoLD^gjerreTT row LD^LJi_iLb, ©rreoOTLaeugj 

array()-^ column-LD^LJi_iLb aaeocriJulSlii. ct6otS6ii print(z[53][9]) oi&sik, 

Glan:(5l'*(g)ii>®uagj 53-6iigj row, 9-6ii§j column-^ serrerr z core ld^lji_i 
3.647669390284779 Gl6ii6frlLJu(5l6ii60)ga aaeocreumi). 


a 60 )i_.#lLuaa 3-<S(g)a serrerr LD^LJi_ia6rr LDL_(5lii> m^lu dataframe-^ 



SffLSlaauuL© ^ 60 ) 6 iiSlu outlicrs cfaauuLL ^geiiaerrrra SffL6laauu(5\<^6OTJD6OT. 


dfl = df[(z < 3).all(axis=l)] 


CT6OTS6II U 60 )L^LU dataframc-^ 1460 rows ©(^uueo^iLiLb, i_i^lu dataframe-^ 1396 
rows ©([ 5 LJU 6 O) 0 LL(Lb aaeocTeuaii). 


(1460, 10) 
(1396, 10) 



Explanatory Data Analysis 


[BLDgj ^[jeiiaerr CTeiieungu ^eoLD^gjgrrgrreOT &t6ot eiilffleiina urrijuuS^ 

Explanatory Data Analysis 



Univariate 


^^5 column-^ aerrerr ^geiiaeoerr LDL_(S\Lb ^[jrrLueiigj univariate 

CT6OT6iiLb, ©geDOT© column-^ agrrgrreoeii ^6OTSjDaOi_a6(jrgu 

Gi^m_ijLjl 60 ) 6 OT 6j^u(S\^§j.®6OTJD6OT CT6OT ^gmLi6ii§j bivariate CTeOTeiiii, u^Seugu 
columns © 60 ) 6 dot^§j CTeiieungu ^^5 target column-6(jr ^rraa^eo)^ 
6j^u(S\^§j.^JD§j CT6OTLJ unijuugj multi-variate analysis CTeOTeiiii ^eo^aauuiSlLb. 


histogram, Density plot LD^guii box plot ^.^Ilu 60 ) 6 ii univariate analysis-<a(g) 
Glufflgjib a^6ii^6OTJD 6 ii 60 )gui_ eiiewaaerr Histogram ct6otu§j ^([5 variable-^ 

a 6 rr 6 TT 6 ii^ 60 )JD, bins-^au ^eiiGleuni^ bin-g^ii CTeiieurrgu ^geiiaeh 

^60)LD^§J6rr6TT6OT CT6OTU60)^a • «^L^«*a60OTI_ 

'GrLivArea' CTguib column-^ u^Seugu 6 )JL_i^gu 60 )i_LU sqft ^^rreiiaerr 
Glan:(5l'*'*uuL_(S\6h6rT6OT. ^60)6ii 500, 1000, 1500 ... 3000 CTgurb bins-^au 

uliJlaauuL©, ^eiiGleurri^ bin-g^ii ct^^ 60 ) 6 ot aS’lB'Serr ^ 60 )LD^§j 6 rr 6 TT 6 OT ct6otu§j 
eueoguLLDfraa an:L_i_LJUL_(5l6rr6TT§j. matplotlib seaborn ^-^iLueoeu 

6ii60)gui_[5ja60)6rT 6ii^[a(g).^6OTJD6OT. Histogram ct6otu§j matplotlib 
6ii^[a(g).^6OTJD 6 ii 60 )gui_GlLD«frl^, Densityplot ct6otu§j seaborn 6ii^[5J(g).^l6OTJD 
6ii60)gui_Lb 


Boxplot CTeOTUgjii ^^5 variable-g analysis Os=LLJ6ii^^(g) ^([5 eueoguL 

6ii60)a ^^5 OuLi^ SuneOTJD ULii ^6<jrgu aneocTuulBii. [B(5\6iJl^ 

26rr6TT San© ^neA median ^@10. OuL_i^a(g) SiDg^ii, aerrerr San©, 

6iiE>^ ^6TT6iia(a) ^geqagh ugeijliLighengj CT6OTU60)^a anL_©Lb. ^^^a Sanui^eA 
CT^60)6U60)LuiL(Lb ^neocTi^ ^[aan^ig) an60CTLJU©Lb ^([5 ^leu #Ij 51 lu iqerrefilaSeTT outliers 
3^{§)Cd. 


https://gist.github.com/nithyadurai87/5be067164741348c6a51d6af6d8d78b7 



import pandas as pd 


import matplotlib.pyplot as pit 
import seaborn as sns 

df = pd.read_csv{"14_input_data. CSV") 
df = df.fillna{0) 

df = df[:100] 

y = [i for i in range{0,10)] 
fig = pit.figure{figsize= {8,6)) 
ax = fig.add_subplot{111) 
ax.set{title="Total Living Sq.Ft", 

ylabel='No of Houses', xlabel='Living Sq.Ft') 
ax.hist(df['GrLivArea']) 

pit.savefig{'Histogram.jpg') 


sns.distplot(df['GrLivArea'], hist = False, kde = True, 



kde kws = {'shade': True, 


linewidth': 3}) 


pit.savefig{'DensityPlot.jpg') 

fig = pit.figure{figsize={8,6)) 

ax = fig.add_subplot{111) 

ax.set{title="Total Living Sq.Ft", 

ylabel='No of Houses', xlabel='Living Sq.Ft') 
ax.boxplot(df['GrLivArea']) 
pit.savefig{'BoxPlot.jpg') 


Bivariate 


@[J 60 ot(B variables CTeiieuagu Gl^ai_iji_i Oaa 60 OT(S\ 6 fr 6 TT 6 OT ct6ot eueoguLii) 

uaijuugj bi-variate analysis 

LD^Gl/DaeOTguLD eoeu^gj eueoguLib 6ii60)gLULJU(S\Lb. 


©[a(g) ^eiiGleuai^ 6 ]UL_i^gu 60 )i_LU sqft ^ 6 rT 60 ) 6 iiu Gluagu^gj 6 ijl^u 60 ) 6 OT eiileoeu 
CTeiieuagu LDaguu(B'^|D§J ct6otu§j scatter plot, heatmap ^.^ 1 lu 60 ) 6 ii Qy)6ULb 
aaL_i_uuL_(S\6rr6TT6OT. HeatMap-^ ©geocr© eueoguuLraaerr aerreneOT. ^6(jr^ seaborn 
6ii^[5j(g).£l6OTJD 6 ii 60 )gui_LDn:a 6 iiLb, LD^GljDrreOT^ matplotlib 6ii^[5J(g).^l6OTJD 
6ii60)gui_LDn:a6iiLb aerrengj. 


Scatter plot ct6otu§j ^geiiaerr iqerrefilaeTmaa 

aaL_(5lii. ^g6iia60)6TTa @jSlLJLjl(5l6ii^^(g) i_i6rr6frla(^a(g) u^eurra, 

6iiL_i_[5ja60)6rTSLun: ^^6u§j Seugu .^leu 6iii^6ii[aa60)6rTSLun: <9^i_ ULU6OTU(S\^^6umi). 


Heatmap ct6otu§j 2 dimensional data- 60 ) 6 ii eueog^gj ariLL a^eiiii 6 ii 60 )guui_ eueoa 
^(g)ii). @[a(g) 12*12 LD^LJi_i OaneoOTL eueoguLib eueogiLiuuLlBengrrgj. Matrix-^ 
26rr6TT ^6ijGl6iin:([5 LD^uiqLb (glJD^^a^ (gijSl-sauuiSlLb. 

Gtungjeuna [bld§j ^geiiaerr ^eoLD^gjerrerreOT CTsyra aneocr a^eiiii. 

seaborn LD^guii matplotlib 6ii^[a(g).^6OTJD ©geoOT© eueoaiLiaeOT heatmaps @[a(g) 
Gtftn:(5l'S<SLJUL_(S\6rr6n:6OT. 


https://gist.github.eom/nithyadurai87/d93a853d86cf5500011cb41308ddl935 



import pandas as pd 


import matplotlib.pyplot as pit 
import seaborn as sns 

df = pd.read_csv{"14_input_data. CSV" ) 
df = df.fillna{0) 

df = df[:500] 

fig = pit.figure {figsize={8,6)) 
ax = fig.add_subplot{111) 

ax.set{title='Living area vs Price of the house', 
xlabel='Price', ylabel='Area') 
price = df['SalePrice'] .tolist {) 

area = df['GrLivArea'].tolist{) 
ax.scatter(price,area) 


pit.savefig{'ScatterPlot.jpg') 



df2 = pd.DataFrame{) 


df2['sale'] = df['SalePrice'] 

df2['area'] = df['GrLivArea'] 
fig = pit.figure{figsize= {12,12)) 
r = sns.heatmap(df2, cmap='BuPu') 
pit.savefig{'HeatMapSeaborn.jpg') 

fig = pit.figure{figsize={8,6)) 

ax = fig.add_subplot{111) 

ax.set {title="Total Living Sq.Ft", 

ylabel='No of Houses', xlabel='Living Sq.Ft') 
ax.hist2d{price,area,bins=100) 


pit.savefig{'HeatMapMatplotlib.jpg') 



Scatter Plot 


HeatMap - Seaborn 


HeatMap - Matplotlib 


Multivariate 


Sld^ull LD^ui_ia 60 ) 6 TTLJ ^(^5 tdidget vdridble CTeiieurrgu 

^ 60 )LD.^JD§j CT6OTa arreDOTuS^ multi-vdridte dudlysis Pdrdllel coordindtes 

CT6OTU§j multi dimensioudl ddtd- 60 ) 6 iia an: 60 OTU^^(g) eueoguL 

6 ii 60 )a 


©[5J(g) plotly mdtplotlib Qy)6ULb eueoguLraaerr eueorr^gj 

aL_i_uuL_(S\6rr6rT§j. 'SdlePrice' ct^uld cdtegoricdl vdridble-<*(g) ^geiiaerr CTeiieiirrgu 
^grrau ug6ijlLLi6rr6rT§j ct6otu 60)^ eueoguLLD an:L_(S\ii. @60)^ eoeu^gj 
ej^rreiigj trend aerrerr^n: ct 6 otu 6 O )0 CBmi aeoOTLjSlLueumi. Plotly Qy)6ULb eiiewgiLiLb 
Surrgj, ^ 6 ijGl 6 iin:g 5 column-g^ii aerrerr min LD^^ii mdx LD^LJi_ia 60 ) 6 rT rdnge- 

Glan:(S\<aauuL_(S\ 6 rr 6 TT 60 )^ aeusfrlaaeiiLb. eueoguLLD ^([5 html Sarruuna 
interdctive (^ 60 )jdlij 1 ^ SffL6l4auu(5l'^tD§J. 


https://gist.github.eom/nithyddurdi87/2b0bb469694d33c7dl472880fl0f67el 


import pandas as pd 


import matplotlib.pyplot as pit 



from pandas.plotting import parallel_coordinates 
import plotly 

import plotly.graph_objs as go 
import numpy as np 

df = pd.read_csv{"14_input_data. CSV" ) 
parallel_coordinates(df, 'SalePrice') 

pit.savefig{'ParallelCoordinates.jpg') 

desc_data = df.describe {) 
desc_data.to_csv{'./metrics.csv') 

X = df[list(df.columns) [:-!]] 
y = df['SalePrice'] 

data = [ 

go.Parcoords{ 


line = diet(colorscale = 'Jet', 



showscale = True, 

reversescale = True, 

cmin = -4000, 

cmax = -100), 

dimensions = list { [ 

diet(range = [1,10], 

label = 'OverallQual', values = df['OverallQual']), 

dict{range = [0,6110], 

label = 'TotalBsmtSF', values = df['TotalBsmtSF']), 

diet (tickvals = [334,4692], 

label = 'IstFlrSF', values = df['IstFlrSF']), 

dict{range = [334,5642], 

label = 'GrLivArea', values = df['GrLivArea']), 
diet (range = [0,3], 

label = 'FullBath', values = df['FullBath']), 

diet(range = [2,14], 

label = 'TotRmsAbvGrd', values = 
df['TotRmsAbvGrd']), 


diet(range = [0,3], 



label = 'Fireplaces', values = df['Fireplaces']), 
diet (range = [0,4], 

label = 'GarageCars', values = df['GarageCars']), 
dict{range = [0,1418], 

label = 'GarageArea', values = df['GarageArea']), 
dict{range = [34900,555000], 

label = 'SalePrice', values = df['SalePrice']) 

] ) 

) 


plotly.offline.plot(data, filename = 

'./parallel_coordinates_plot.html', auto_open= True) 



Polynomial Regression 


^([5 S[B*rj Glurii^^^n:^ #l<sff> 6 un: 6 OT ^g6iia(^4(g) polynomial 

regression-gu ULueOTuQ^^eumi. .^i^aaeoOTL ^^5 6 )JL_i^^an: 6 OT ffgjg 

^i^LL(ii, ^^^ff>n:6OT 6 ijl 60 ) 6 ULL[Lb Oan:(5l'S<aLJUL_(S\6rr6n:§j. linear 2nd 

order, 3rd order, 4th order & 5th order polynomial unijaau u(S\.^jd§j. 


https://gist.github.com/nithyadurai87/b7d3bf7733b5d4a8d2c8b2dlb8dcb531 


import pandas as pd 

import matplotlib.pyplot as pit 

from sklearn.linear_model import LinearRegression 
from sklearn.preprocessing import PolynomialFeatures 


X = pd.DataFrame{[100,200,300,400,500,600],columns=['sqft']) 


y = 

pd.DataFrame{[543543,34543543,35435345,34534,34534534,345345],columns= 
['Price']) 




lin = LinearRegression {) 


lin.fit(X, y) 

pit.scatter (X, y, color = 'blue') 

plt.plot{X, lin.predict(X), color = 'red') 

pit.title{'Linear Regression') 

pit.xlabel{'sqft') 

pit.ylabel{'Price') 

pit.show{) 

for i in [2,3,4,5]: 
poly = PolynomialFeatures(degree = i) 

X_poly = poly.fit_transform(X) 

poly.fit{X_poly, y) 

lin2 = LinearRegression {) 

lin2.fit{X_poly, y) 

pit.scatter(X, y, color = 'blue') 

pit.plot (X, lin2.predict(poly.fit_transform(X)), color = 


' red') 


pit.title{'Polynomial Regression' ) 



pit.xlabel{'sqft') 


pit.ylabel{'Price') 


pit.show{) 


linear regression-^ eoeu^gju Sungj, San© @>([5 

^geiiaeffleOT iJ’gjLb Ouni^^^niD^ ilHeOTeui^mnnu ^ 60 )ld.®jd§j. ©gjSeu under fitting 

CT6OTUU©Lb. 


CT 6 OTS 6 I 1 2nd order (^ 60 )jdlij 1 ^ ^^gu 60 )i_LU cube a 60 OT©Ljli^aaLJUL_© ^ 6 ii^ 60 )jd 
^ g6iia(^i_6OT Glun^ 5 ^^ (j^LUg^iiSungj fleOTeui^mnnu ^ 60 )ld^jd§j. ©gjSeu non¬ 
linear function CT6OTLJU©Lb. ^^neugj ^^5 Scb’d" Sani_na ^eoiDLungj. 


^eiieunSjD 3rd order-^ ^geiiai^eoLLU cube a 60 OT©i!jli^aaLJUL_© ^ 60 ) 6 ii 
^g6iia(^a(g) ^eOTguii a^ny Gla^eueo^a aneocreunii. 


aeoL^liLina 4th order-^ ^geiia^ffleOT iJ’gjLb (j^(i^ 0 naLJ Gluni^^gjmnnu 

non-linear ^ 60 )ld.^ 1 jt)§j. ©gjSeu over fitting CT6OTnu ^eo^aauuQLD. ©gjSuneijrJD 
over fitting-ii aiJlLuneOTgj ^^6U. 


CT6OTS6II 61 ^^ order-^ ^ 60 ) 6 ot^§j^ L^gjii, cbld^j non-linear ugeueurrau 

Glurri^^gj-^ljIlS^n: (over fitting ^^eurriD^), ^ew^SiLi CBaib aeorfrlufl^^ ct(S\^§J'* 
GlaneneTTeumi. ©Lb(j^ 60 )JDLi!jl^ ^([5 CT 60 (jT 6 ®ffl^(g) LDLca^aen 

a60(jT(S\flii9-'*'*uu(S\6ii^n:^ , ffLD6(jrua(S\ ^^6(jr LDi_[5J(g)a60)6TTU Glurrgu^gJ 

ijl 6 OT 6 ii([ 5 LDn:gu ^ 60 )ld.^ 1 jd§j. ^^rreijl^ CTeocraerr ^^affl«saLJu(S\6ii^n:^ feature 
scaling-6OT ULU6OTun:(5i @[a(g) (j^a-^liu^gjeiiLb GlugU'^iDgJ. 


Underfitting - High bias 


a60?MLji_iaan:6OT San:i_n:6OT§j Glurii^^^n:^ (gleoeuSiLi 

underfitting CTeOTUu^-^ltDgJ. ^erreii ^g6iia(^a(g) (g) 60 )JD^^ features 
GianeoOT© a6Drfrla(g)Lb Sungj ©^(gleoeu 6 j^u(S\.®jd§j. ©gjSeu high bias i!jlgffs= 60 ) 6 OT 
CT6OTguii ^60)^aa5LJu(5i'^tD§J. 6j06OT«frl^ ihla-s (g)60)JD^^ ^eneii ^Lbs^raaeogn.? 

Offiij^u(S\.^JD§j. 2^ag60CT^§j.S(g) 50,000 ^g6iia(^.S(g)(m) ©geocrSL 
©geoOT© features-ga Glaaeocr© a6®frla(g)Lb Suagj ^geiiaerr gr^jeiiib 
Giua^5^^a§j. greOTSeu ©gjSuagOTJD i!jlgffff60)6OTa(g) ^geiiagfiigOT gT60OT6®frla60)a60)LU 
^^affluugj ^eiiaaagj. features- 6 ot gT 6 DOT 6 orfrl.s 60 )a 60 )LU ldlQSld 
S 6II60(5T(5ili. . 


Overfitting - High variance 


^6TT6ii features-g Qy)6ULb underfitting-g^ ^eiilijaaeuaLb &t6ot 

6j^Glft6OTS6ii uaij^S^aLD. ^gjSeu ^6TT6ii4(g) Sffij^gjeiilLLa^, 

overfitting ct6otjd [gleoeu 6 j^ul_(S\ @ 06 O) 6 ot^ ^efjlijuu^^aaa 

Sffijaauu(5i6iiS^ regularization parameter ^^aeugj ^geiiaefileOT 

CT60(jT6orfrl460)a (g) 60 )JD 6 iiaa features ©(^a^ibSuagj ©^(gleoeu 

6j^u(S\ii. 2^ag60OT^§j4(g) Gleuauib 50 ^g6iia(^a(g), 250 features GlaaeoOT© 
a6®lTlft(g)U)Sua§j Saai_a6OT§j, ^60)6 ot^§j^ ^geiia^frleijr iJ’gjLb ^6a6ii4(g) 
^^amaaij Gluai^^gj-^ljDgj. ©gjSeu high variance &T6OTau 
©^ 60 ) 6 OT^ ^eiilijaa features CT 60 OT 6 ®frla 60 )a 60 )LU ihlaeiiLD (g) 60 )JD^^ag^Lb high bias 
@§jS6ii bias-variance tradeoff CT6(jray ^60)^aaLJu(S\'^ltD§J. 
SuaeOTJD i!jlg.?a= 60 ) 6 OTa 60 ) 6 a ^efilijaa features CT60OT6®frl460)a60)LU ffffliLiaeOT ^6TT6ii4(g) 
(g) 60 )JDaa S6ii60OT(S\ii ^^6u§j regularization-gu ULU6OTu(5l^^6uaLb. 


Regularization 


^eiiGleurri^ feature-eiiLguii ©60)60OT4auu(S\Lb parameter-6OT (^LLaaaefileOT) 
^ 6 rT 60 ) 6 iia (g) 60 )JDa.^ljD§j. ct6otS6ii features-6OT CT 60 OT 6 orfrla 60 )a 

^60)611 aeorfflufll^ (g)60)JD^^ ^errSeii u[5jSa^(g)LDa^ GlffLULueuaib. 
linear regression-6iii_6OT ©eoeocriLiLb Suagj, ffioeOTua© i!jl 6 OT 6 ii([ 5 LDaau 

^ewLD^JDgj. 


Linear regression: 


6umi)i_a CT6OTU§j^a6OT regularization-aanem: parameter. ld^lji_i 1 

0^ai_[5J.^l ^ 60 ) 6 OT^§j feature -4(g)Lb ^ 60 )LD 6 ii 60 )^a aaeocreiiLb (j =1 to n). 
6jGl6OT«ffl^ xO -6OT LD^LJI_1 CTuSuagJLD 1 CT6OT ©(^^(giGlLDeOTUeO)^ 6J^a6OT(S6II 
aeoiJTSLmi). ^aSeu ^L_i_aO - 6 ii 60 )i_lu LD^uewua (g) 60 )JDaa^ S^ 60 ) 6 iillj 1 ^ 60 ) 6 u. 


^S^Sua^ 6uaLbi_a6iil6 OT ld^lji_i iBla ^^amaaeiiLb an_i_a§j. iSla-s 

(g) 60 )JD 6 iiaa 6 iiLb <9^i_a§j. (g) 60 )JD 6 iiaa ©(^^^ag^ii, overfitting-g^ 

^6iilij.saa§j. ^^amaa bias 6j^ui_.s aag 6 D 0 TLDa.^ 6 ijl(S\Lb. ct6otS6ii 

ffffliLiaeOT ^6TT6iil^ SeueDOTiJlLb. 


Gradient descent-6iii_6OT regularization © 60 ) 60 CTU_iLbSua§j, ^^^aasa ffioeOTua© 
ljl6OT6II([5LDaaU ^60)LDU_lLb. ©[a(g)Lb ^L_l_a 0 -6I1I_6OT ©60)60CTLUaLD^, ^L_l_a 1 -d}l([5^§J 
regularization ©60)60CTaauu(S\'^in)§J. 


(g)60)JD^^ cost a 60 OT(S\i!jli^LJU^^an: 6 OT OT^ngeocT ^^^g^§ji_6(jr regularization 

@60)60CTlL(LbSua§J, l!jl6OT6Iig5LDagU ^60)LDlL(Lb. 


Normal Equation: 


Logistic regression 


[BLDgj a6®fTlLJi_i ^([5 ld^ui!j160)6ot 06ii6ffluu(S\^^n:LD^, ejS^guii ^([5 eueoaiijleOT 
^eoLD^^n:^, ^gjSeu logistic regression &T6OTUu(5iii. eueoauu©^^^, 
binary ld^fuld multiclass CTguLb ©([5 [B60)i_OuFULb. logistic regression 

&T 6 OTU§j @^;a)(g) 2^6ii.^6(jrJD ^^5 algorithm OuLuffl^ LDL_( 5 iii^n: 6 OT 

regression CTguLb eunrr^eo)^ aeherrgj. ^([5 classification-aaneOT 

algorithm 



Sigmoid function 


^([5 eiile^^Luii [B60)i_GluguLDn:? [B60)i_OuJDn:^n:? ©^eoeuLun:? 

CT6OTU60 )^SlU aeorffl-S-^JDgJ. ^li &T6OTU§J 1 &T6OT6IlLb ©^60)6U &T6(jrU§J 0 CT6OT6IlLb 
a6®M.sauu(S\Lb. ^aSeu aewfTluuneOTgj 0-d3l(_[5^§j l-6ii60)g ^60)LDiL(Lb. 

eueoguLii) i!jl6OT6ii([5LDn:gu- eueoguL^^^ z-6ot ld^lj 60 )ulj Ourrgu^gJ 
a 60 ?M.sauu(S\Lb g(z), l- 6 ii 60 )g ^ 60 )ldlu S6ii6DOT(B©i-D«frl^ 

^^^gLDn:6OT§j l/(l+e**-z) ct6otfu ^ 60 )LDLL(Lb. ©gjSeu sigmoid function CT6OTgu 
^60)^4aLJU(5i'^iD§J. 


CT6OTS6II z-4an:6OT h(x) -gu Glurri^^^eOTn:^, 0-1 euewg 

^ 60 )LD 6 ii^^an: 6 OT s=LD6OTun:i_n:a LjleOTeui^LD ^eoiDiLiLb. ©gjSeu logictic 

regression-aarreOT ffLD6OTun:(S\ 


^([5 L 6 l 6 OT 6 OT@ff^ spam-^ ©^60)6ULun: ct6ot 4 aeorfriiJu^^arreOT flleOTeui^Longu. 


https://gist.github.eom/nithyadurai87/f09984303f976ca6eb8a64a4b7f0e391 


import numpy as np 
import pandas as pd 

from sklearn.feature_extraction.text import TfidfVectorizer 
from sklearn.linear_model.logistic import LogisticRegression 



from sklearn.model_selection import train_test_split, 
cross val score 


df = pd.read_csv{'./spam.CSV', delimiter=',',header=None) 


X_train_raw, X_test_raw, y_train, y_test = 
train_test_split(df[1] , df[0] ) 


vectorizer = TfidfVectorizer{) 


X train = vectorizer.fit transform{X train raw) 


X test = vectorizer.transform(X test raw) 


classifier = LogisticRegression{) 


classifier.fit{X_train, y_train) 


predictions = classifier.predict{X_test) 


print(predictions) 


['ham' 'ham' 'ham'] 



Decision Boundary 


h(x) = 1 CT6OTU§J CTuSungJli ^li CT6OTU60 )^SlU (g)jSl4(g)Lb. CT6 otS6 II l-h(x) CT6OTU§J 
©^ 60 ) 6 U CT 6 OTU 60 )^a (g)j51a(g)Lb. a0n:[J6OCT^§j4(g) h(x) CT6OTU§J [Bn: 60 ) 6 TT LD 60 )^ GIululu 
70% 6iin:LLJLJi_i aerr^ngj ct6ot aeorfrla^jnGl^sfrl^, L6’0(j^6rr6rT 30% ©^60)6U ct6otu60)^4 
a6®ffl^§j6rr6TT§j ct6otGjd 


^geiiaerr -^i^aaeoOTL eueoguL^^^ aneocTuu^eugjGun:^ ugeueuna 
^60)LD^^([5.s.®JD§j CTsfrl^, CT^^(g) Gld^ Glff6OTJDn:^ ^li CTsyra aeorfrlaaeumi, 

^eoLD^^n:^ ©^ 60 ) 6 U ct6ot.s aeo^aaeurrii ct 6 otu 60 )^ (lP^!^6ii Gls^LueuG^ 
decision boundary CTuGungjii ^L_i_n: LD^ui_ia60)6n:LJ Oungu^G^ 

^60)LDlL(Lb. -3, 1 , 1 CTgULb LD^LJI_ia60)6TT ^LLflO, ^LLfll, ^L_l_n:2 CTgUlBlL^^^ 
Giun:^5^^6OTn:^, h(x)=l ct 6 ot a 6 orfrlLJU^^(g) xl x 2 -^ 6 ot§j 3 -<S(g) Gld^ 

^60)LDLU G6ii60OT(S\Lb CT6OTU60)^ decision boundary-^a ^eoLD^gjerrerrgj. 


^geiiaerr ^i^aaeoOTLeiirrgu non-linear (^ 60 )jdlij 1 ^ ugeijliijli^LJU^n:^, 

LD^Lji_ia6rTn:6OT -1,0,0,1,1 ct6otu§j 2-Lb order polynomial-^ ffioeOTunLiy^ 

Glun:([ 5 ^^LJU(S\^JD§j. l-CT6(jru§j boundary-^a aeocr© LjliyaaLJUL_(S\6n6rT§j. 
©gjGeu threshold classifier CT6OTguii ^60)^aauu(S\ii. 


Cost function 


a60OT60)LDLijl^ [Bn:60)6TT LD60)^ GluLULU 6iin:LLJLJi_i CT6OT aeorfilaaLJUL 

SeueoOTi^Lugj ©^60)6 U ct6ot a6®M<aauuL_i_n:^, ^([5 error. ^eijeurrSjD GluLULurrgj 
CT6OTU60)^ GluLULLiLD CT6OTa aeorffl^^ng^LD ^([5 erroL ^^rreiigj 1 CT6(jru§j 0 ct6ot 
a6®fflaauuL_i_n:S6un: 0 ct6otu§j 1 ct6ot aeorfflaarjuLLnSeun: ^^gueoLLU 

^6IlgU CT^^60)6OT [£laL^^§J6rr6TT§J CT6OTU60)^4 a60CTa.^lL_(S\a S^JD ©Lueurrgj. 

Infinity (ct 60OT60 ot^jd) ct 6 otuS 0 LD^uurra 

6ii60)gui_[5ja6rr ilileOTeiii^LDrrgu- x ct6otu§j h(x) CTsfii^, y -^6 ot§j infinity-g 
ScBna-^lff eueo^rreiiaarreOT -log( h(x)) 


^^rreiigj, 


1 CT6(jru§j 0 CT6OT a6®frlaaLJUL_i_n:^ cost = -log( h(x)), ^eiieurrSjD 


0 CT6(jru§j 1 CT6OT aeorfrl-saijuLLn:^ cost = -log(l-h(x)) 


CT6OTS6II cost-<san:6OT flleOTeui^Longu ^ 60 )ld.^Ijd§j. y=l CTSjreiiLb y=0 

CT6OT6iiLb 60)6ii^§j Glan:6rr6rT6iiLb. 


When y = 1, 


= y . log(h(x)) + (l-y).log(l-h(x)) 

= l-log(h(x)) + (l-l).log(l-h(x)) 

= log(h(x)) + 0 

= log(h(x)) 

When y = 0, 

= y . log(h(x)) + (l-y).log(l-h(x)) 

= 0.1og(h(x)) + (l-O).log(l-h(x)) 

= 0 + l.log(l-h(x)) 


= log(l-h(x)) 


contour plots ^Srr ^([5 .^ 160 ot 60 ot eui^eu ^eroiDui!]!^ ^eoLDLuniD^, 

.#lgu 6II60)6TT6Iia60)6rrLJ 6J^JD ©JDaa[aa60)6TTa Glan:60OTl^([5a(g)Lb. 

^gjSeu non-convex function CT 6 OTLJu( 5 lii. ^^rreugj regression-aanem: 
6 ii 60 )gui_^^^ ^([5 global optimum LDL_( 5 lii <sn:60OTUu(5lu). ^ 6 OTn:^ 

classification-aaneOT 6 ii 60 )gui_^^^ u^Seugu local optimum aneocTuu^ii. 
6jGl6OT«lfl^ '@(i5'S(g)', '©^60)6U' CTgULD ©geOOT© LD^UI_ia6rr LDL_(51U) LDnjSl ldujSI 



a6®fflaauu(B6ii^n:^, local optimums ©(^a-^leOTJiieOT. SuaeOTJD non- 

convex function-g^ii CBaii) gradient descent-gu ULU6OTu(S\^^6uaLb. 


gradient descent-6OT ffLD6OTua(5iii> multiple linear-g 
^([5 CT6OT6OTGl6ii«frl^, h(x) -aanem: ^L_i_a-transpose.x ct6otu§j 

@[a(g) sigmoid function-ga Glaa60OTi^g5a(g)Lb. 



Classification accuracy 


{BnewOT 260OT60 )ldllj1S6uSlu ld60)^ GIululu 6iin:LLJLJi_i ©(^a^iiGurrgj @^60)6u' 
aeKffluugjLD, ©^eurr^Gurrgj '@^ 5 <S(g)' oi&sik, aeorfrluugjLb classification-^ 
[B60)i_Gluguu3 ^6iigu ct6otG6ii CT6ij6ii6rT6ii ^g6iia(^4(g) ffi5iLun:6OT a6orfriui_ia6rr 

(glai^^gjerreTTgj CTsyra aeocTLjSleuS^ accuracy 


^([5 u^gj [Bn: 60 ) 6 n:.san: 6 OT 6 iia«ffl 60 ) 6 u a6Drfflui_ia6rr 

an: 60 CTLJU(S\ 6 ii§jGun:^ ct6ot eoeu^gja GlanenGeumi). ^^rreiigj y_true 

260OT60)ldlijIG6uGlu ld60)^ GIulu^^u, ©^60)6ULua CTguLD efSleiigii 1 0 

26 rr 6 TT§j. a6orfflui_ia6rr y_pred agfrerrgj. © 6 ii^ 60 )jd ^ui!jIl_(S\u 

un:ija(g)LbGun:§j ©geoMLneiigj, ^/naeiigj ej^rreugj ft6®fTlLJi_ia6rr LDL_(5iii ldujSI 

[B60)i_Gtu^jSl([5LJU60)^ aeusfrlaaeiiLb. ct 6 otG 6 ii Glmn:^^ 10 ^geiiaefrl^, 3 LDL_(5iii 
^euJDaft ^eoLD^^i^uu^n:^, @ 06 (jr accuracy 70% ct6ot eu^gjerrgrrgj. 


https://gist.github.coni/nithyadurai87/7668ce262ed9070d89bl58bb7fl3c5cb 


from sklearn.metrics import precision_recall_fscore_support 
from sklearn.metrics import accuracy_score 
from sklearn.metrics import confusion_matrix 
import matplotlib.pyplot as pit 


y_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1] 
y_pred = [0, 1, 0, 0, 0, 0, 0, 1, 1, 1] 



print {'Accuracy:', accuracy_score(y_true, y_pred)) 

print (confusion_matrix(y_true, y_pred)) 

print (precision_recall_fscore_support(y_true, y_pred)) 

pit.matshow(confusion_matrix(y_true, y_pred)) 

pit.title('Confusion matrix') 

pit.colorbar () 

pit.ylabel('True label') 

pit.xlabel('Predicted label') 

pit.show () 


Accuracy: 0.7 
[[4 1] 

[2 3]] 

(array([0.66666667, 0.75]), array([0.8,0.6]), array([0.72727273, 0.666666 
67]), array([5,5], dtype=int64)) 



Confusion Matrix 


uleOTeiii^LD 6ijl^a6ffl6(jr ui^ ai^eurraauu^'^lJDgJ. 


0 CTguLb LD^LJi_i 1 CT6OT a6®fTlaauuL_i_n:^ = False Positive 


1 CTguii) LD^Lji_i 0 CT6OT a6®M4auuL_i_a^ = False Negative 


1 CTguii) LD^uuleOT a6®fflui_iLb 1 ct6ot ^ 60 )LD^^a^ = True Positive 


0 CTguLb LD^uflleOT a6®frlui_iLb 0 ct6ot ^ 60 )LD^^a^ = True Negative 


Precision, Recall & FI score 


Precision (P) ct 6 otu§j ^eiiJDrra CTSjra a 6 ®frl^§j 6 rr 6 rT§j 

61 6OTU 60)^LL|LD, 


Recall (R) &r 6 OTU§j ct^^60)6ot ^eii/naa '©^60)6u' &T 6 OTa aeorfrl^gjerrerrgj 

CT6OTU60)^iL(Lb ft60CTa.^(5i'^iiD§J • ^6iiJDaa aeorfrlaaLJULL ©eiieiJlgeoOT© 
LD^Lji_ia60)6mL(Lb LD^uuaa LDa^gu 6 iiS^ F score 

Ljl 6 OT 6 ii([ 5 LDagu- 


P = True Positive / (True Positive+ False Positive) 


R = True Positive / (True Positive+ False Negative) 


F score = 2 (PR / P+R) 


©60)6Iia60)6TT4 a60OT(S\ Ljll^LJU^^aa6OT (^a-^lLU^gjeULD &T6OT6OT CT6OTgU ©uSuagJ 
uaijaaeumi. 2^ag60OT^§j4(g) aLLDLil^ 6j^UL_(S\6rr6rT aLi^ujleOT 

^ 6 rT 60 ) 6 iiu Glua([5^§j, i_i^^ ScBaiLiaaneOT aLi^Lua ©^eoeuLua ct6ot (^19.611 

GlffLULLiLD Sffa^60)6OT60)LU ct(S\^§j 4 GlaaenSeiimi). 

gJT^/Si^ ^([56II([54(g) LDL_(5i(SLD '^li' CTgULD (^ 19.611 aa60CTLJU(S\Lb. GlU([5Lbua6OT60)LDLUa6OT 
'©^60)6u' CTguii) (^i^Seu (gleojn^^i^aigiLb. SuaeOTgu 
(j^i^ 6 iil 60 ) 6 OTff ^^rreii ^geiiaew^rra GlaaeoOTLeoeiiSLU "skewed 

classes” CT6OTJD60)^aaUU(S\.^l6OTJD6OT. ©611^60)JD 60)6II^§J 260OT60)LDLUa6OT 

aL_i^LLjl6OT ^6rT60)6iia a6®frl4(g)LbSua§j, '^li' CT6OTU^^(g) u^euaa '©^60)6u' 
CT 6 OTU 60 )^SlU GlU([ 5 Lbua 6 OT 60 )LDLUaa Gl6II6fiiLJU(S\^§JLb. © 611 ^ 60 )Jd 4 a 60 OTI_j 5 l 6 II^^(g) 
a^6ii6iiS0 precision ld^^ld recall 


Trading-off between Precision & Recall 


^([ 56 ii^ 560 )i_Lu aLi^LuleOT ^6rT6ii 5mm -a(g) Sld^ i_i^fu ScBniLiaarreOT 

aL_i^ CT6OT threshold ^60)LDa<auuL_(5l6rr6rT^n:a eoeu^gja GlarrerrSeumi). ©uSurrgj 
^6rT6ii4(g) Sld^ ^6 OTn:^ ffn^ngeocr aLi^ ^(^eufflLii Glff6OTgu '©§j 

i_l^gU ScBrriLjaarreOT aLi^.' ct6ot^ ^eiiJDrraa s^jSI eiilLLn:^, ^euij S^eweuLLjl^eurrLD^ 

U 6 U 6 iid}lL 6 l(g) .#l^ff60)ffa60)6TT (SLD^Glan: 6 rr 6 rT SeueoOTi^Lijli^a^Lb (false positive - high 
precision). 


CT6OTS6II [BLDa(g) ldl_(S\Sld '^li' CTsyra SklJD S6ii60OT(S\Lb 

CT6OTU^^an:a threshold-^ 7mm -.S(g) Sld^ ^^aLJu(5l^§jS6iin:Lb. ©uSungj 6mm 
^grreiil^ i_l^gU ScBniLi aLi^ ^(^eufflLii Glff6OTgu 2[aa(^a(g) 

'@^60)6u' &T 6 OTa s^guii ^umLiLD ScBt^LD (fdlse negative - high recall). ©06OTn:^ 
^ 611(^10 ^6UL_.#lLULDn:a 6fjl(5l6iin:ij. 


precision -ga (gieojiiaa efSli^LDLileOTn:^, recall Recall-ga 

(g)60)JDaa efSli^iiflleOTn:^ precision ©gjSeu trading-off between 

precision & recall CT 6 OTLJu( 5 l'^ltD§J. 


Multi-class classification 


0 LD^guii 1 CT6OT @([5 iljlffleiiagrr LDL_(Bii ©^eurriD^, u^Seugu Qlffleiiagrr 

i_l^^n:a 611 (^ 11 ) ^ 6 otj 5160 ) 6 ot fllffleiileOT ^eoLD-sa S 6 ii 60 OT(S\Lb ct 6 ot aeorfrluuS^ 

multi-class classification ct^^60)6ot ililffleiiaerr 

^^^60)6OT logistic a 6 orfflui_ia 6 rr [B60)i_OuguLb. i!jl 6 OT 6 OTij 6II([5.^6 otjd ^ 6 OTgu, 

^60)6OT^^6OTag^Lb aeorffl-saijuL© , Glua([5^§j.^lJDS^a, 

Ijll 5 i 60 ) 61 lff GlS^ 6 OTJD 60 )l_lL(Lb. 


-^L^aaeoOTL #laui_i, aMi^a, u.?60)ff, LD^s^err ct^uld [Ba6OT(g) 

ulijleiiagffl^ 6ii60)6mij[aa6rr agrrerreOT. 


.#laui!jl60)6OTa aeorffliju^^aagOT hypothesis 2g56iiaaaLJU(S\Lb. h(x) = 
1 CTgOTUgJ #laUl!jl60)6OTft (g)j5l<*(g)Lb. <#1<SUI_1 ^60)6OT^§JLb 0 

(g)j51aaLJu(5iii. 


2 ME^a 60 ) 6 ii 4 aerofrlLJU^^aagOT hypothesis 2g56iia.sauu(S\Lb. h(x) = 1 
gT6OTU§J 2ME^a60)6Iia (g)jSla(g)Lb. 2ME^a ^60)6OT^§JLb 0 (g)j51.aaLJU(S\Lb. 


©ehguaJDaa (gljD[5ja(^4(g) hypothesis ag56iia4aLJU(S\Lb. 



ijl 6 OT 6 OTg, i_i^^n:a ^^5 eueogrriLiLb ^lauurra 

a60?M.sauu(S\6ii^^an:6OT ot^^luld 30%, aME^rreuna a6®frl.saLJU(S\6ii^^an:6OT 
OT^^LUii 40%, uffeos^Luna ff>6D^.sauu(S\6ii^^an:6OT ot^^luld 60% LD^ffgrma 
a60?M.sauu(S\6ii^^an:6OT ot^^luld 50% &t6ot Gl^ , 

OT^^LUii illlffleiileOT ^ 60 )LDLL(Lb. ©gjGeu multi¬ 

class classification 


Decision tree, gaussian NB, KNN, SVC ^.^Ilu 60 ) 6 ii ©gjGurreOTJD multi class -<a(g) 
§j60)60OTi_ii5iLLiLb algorithmus @»t[5 LD6uij LD^djliun:, Ggrr^rreun:, ^[rLDeogiLin: ct6ot^ 

^giorrsiflLJu^^arreOT multi-class classification flleOTeui^LDrrgu- @ 60 ) 6 ii u^Geugu 
algorithmsQy)6ULb (£laL^^^uu(5l'^l6OTJD6OT. ©60)6iia6frl^ ^^aiDneOT score ld^^ld 
precision&recall GlarreoOTLeo)^ CBmi G^ijeii GlffLULueumi.. 


https://gist.github.eom/nithyadurai87/aaded978eb7e545006ed6117c97b86b3 


from sklearn.metrics import confusion_matrix 

from sklearn.metrics import precision_recall_fscore_support 

import pandas as pd 

from sklearn.model_selection import train_test_split 
from sklearn.tree import DecisionTreeClassifier 
from sklearn.svm import SVC 


from sklearn.neighbors import KNeighborsClassifier 



from sklearn.naive_bayes import GaussianNB 

df = pd.read_csv{'./flowers.CSV') 

X = df[list(df.columns) [:-!]] 
y = df['Flower'] 

X_train, X_test, y_train, y_test = train_test_split(X, y, 
random_state = 0) 

tree = DecisionTreeClassifier{max_depth = 2).fit{X_train, y_train) 
tree_predictions = tree.predict{X_test) 
print (tree.score{X_test, y_test)) 

print {confusion_matrix{y_test, tree_predictions)) 

print {precision_recall_fscore_support{y_test, tree_predictions)) 

SVC = SVC(kernel = 'linear', C = 1).fit{X_train, y_train) 
svc_predictions = svc.predict{X_test) 
print (svc.score{X_test, y_test)) 

print {confusion_matrix{y_test, svc_predictions)) 


print (precision recall fscore support(y test, svc predictions)) 



knn = KNeighborsClassifier{n_neighbors = 7).fit{X_train, y_train) 
knn_predictions = knn.predict{X_test) 
print (knn.score{X_test, y_test)) 

print {confusion_matrix{y_test, knn_predictions)) 

print {precision_recall_fscore_support{y_test, knn_predictions)) 

gnb = GaussianNB{).fit{X_train, y_train) 
gnb_predictions = gnb.predict{X_test) 
print (gnb.score{X_test, y_test)) 

print {confusion_matrix{y_test, gnb_predictions)) 

print {precision_recall_fscore_support{y_test, gnb_predictions)) 


0.8947368421052632 
[[15 1 0] 

[3 60] 

[ 0 0 13]] 

(array([0.83333333, 0.85714286, 1. ]), array([0.9375, 0.66666667, 1. ]), 
array([0.88235294, 0.75, 1. ]), array([16, 9, 13], dtype=int64)) 



0.9736842105263158 
[[15 1 0] 

[0 9 0] 

[ 0 0 13]] 


(array([l., 0.9, 1. ]), array([0.9375, 1., 1. ]), array([0.96774194, 0.94736842, 1. 
]), array([16, 9, 13], dtype=int64)) 


0.9736842105263158 
[[15 1 0] 

[0 9 0] 

[ 0 0 13]] 


(array([l., 0.9, 1. ]), array([0.9375, 1., 1. ]), array([0.96774194, 0.94736842, 1. 
]), array([16, 9, 13], dtype=int64)) 


1.0 

[[16 0 0] 

[0 9 0] 

[ 0 0 13]] 

(array([l., 1., 1.]), array([l., 1., 1.]), array([l., 1., 1.]), array([16, 9, 13], 
dtype=int64)) 


6 iin:i^.a 60 )aLun: 6 TTij agrrerr 6 iin:*[j^ 60 )^a 60 ) 6 TT.s GlaaeDor©, 

i_iaaij 6 ii 60 )ff>Lijl 6 OT ^ 60 )LDiL(Lb 6T6OT ff>6®fTla(g)Lb MultinomialNB algorithm 
ljl6OT6II([5LDagU- 


https://gist.github.com/nithyadurai87/3ce9dab55025felfd41b4da48d3fcbd8 



import pandas as pd 


from io import StringlO 
import matplotlib.pyplot as pit 

from sklearn.feature_extraction.text import TfidfVectorizer 
from sklearn.feature_selection import chi2 
import numpy as np 

from sklearn.model_selection import train_test_split 
from sklearn.feature_extraction.text import CountVectorizer 
from sklearn.feature_extraction.text import TfidfTransformer 
from sklearn.naive_bayes import MultinomialNB 

df = pd.read_csv{'./Consumer_Complaints.CSV', sep=',', 
error_bad_lines=False, index_col=False, dtype='Unicode') 

df = df[pd.notnull(df['Issue']) ] 

fig = pit.figure{figsize={8, 6) ) 

df.groupby{'Product').Issue.count{).plot.bar{ylim=0) 


pit.show {) 



X_train, X_test, y_train, y_test = train_test_split(df['Issue'], 
df['Product'], random_state = 0) 

c = CountVectorizer{) 


clf = MultinomialNB{).fit 

(TfidfTransformer{).fit_transform(c.fit_transform{X_train)), 
y_train) 


print(clf.predict(c.transform{["This company refuses to provide m 
verification and validation of debt per my right under the FDCPA. 
do not believe this debt is mine."]))) 


tfidf = TfidfVectorizer{sublinear_tf=True, min_df=5, norm='12', 
encoding='latin-1', ngram_range={1, 2), stop_words='english') 


features = tfidf.fit_transform(df.Issue) .toarray{) 
print (features) 


df['category_id'] = df['Product'].factorize {)[0] 


pro_cat = df[['Product', 

'category_id']].drop_duplicates{).sort_values{'category_id') 
print {pro_cat) 

for i, j in sorted(diet{pro_cat.values).items{)): 
indices = np.argsort {chi2 {features, df.category_id == j) [0]) 


print (indices) 



feature_names = np.array(tfidf.get_feature_names{))[indices] 
unigrams = [i for i in feature_names if len (i.split { ' ')) == 1] 

bigrams = [i for i in feature_names if len (i.split { ' ')) == 2] 

print{,i) 

print{"unigrams:",','.join(unigrams[:5])) 
print{"bigrams:",','.join(bigrams[:5])) 


^eiiGleun:]^ product -6 ot i^arrijaerr ULijl^^l4(g)a 

Glan:(5t'*'aLJUL_(5t6rr6rT6OT ^^5 euewguLLb Qy)6ULb eueorr^gi urrijaauuiSl^JDgj. 



lent loan 




















Ljl6OT6OTg ^60)611 70-30 CTgUli Ul^ GlarT^'S'aLJUL© 

Sff n:^aaLJu(5\<^|D§J • 


TfidfVectorizer Qy)6ULb i_ian:ffl^ aerrerr eurrir^eo^aerr ^eoeOT^gjii 

features SffL6laaLJu(5t'^6OTJD6OT. i!jl6OT6OTir chi2 Qy)6ULb ^eiiGleuai^ 
category -SujaQii) Gt^ai_iri_i GlaaecOT^erren: euaij^eo^aeffleOT ULiyiu^ 

fll6OT6OTij ^60)611 euair^ero^Luaa ^eroiD^^a^ 

category -6 ot ^ 60 )LDLL(Lb, ©geoOTiygeoOTLua category -6 ot 

^ 60 )LDLL(Lb CT6OTU§j unigrams, bigrams CTguii) OuLuffl^ SffL6laauu(5t<^l6OTJD6OT.. 



Vectors 


classification problem ct6otu§j '©^eweu' CTguii) ld^ljlj16ot 

a6®iTlLJLjl60)6OT (glaL^^gJli CT6OT 6J^a6OTS6II aeOOTSLmi. ©60)611 (^60 )JdSlU 1 ^^6U§J 0- 
(g)j51aaLJu(S\ii. CBaLD ^l6UffLDLULb 6iia4.^lLU[aa60)6TTSLua, (£l^^ui_[aa60)6rTSLun:, 
^6iilLU[aa60)6TTSLua a6rr6ffi_aaa GlarrQ^gj ^6ffi4a S6ii600Ti^Lijl([5a(g)Lb. 

©§jSun:6orJD ©i_[aa6ffl^ © 6 ii^ 60 )JDGlLU^ 6 umi) I's & O's LD[r^FU6ii^^(g) sklearn 
6ii^[a(g).£l6orJD u^S6iigu 6ii60)aLun:6or Gl6ii4i_ija6rr u^jSliLjLb ^6ii^j5l6or ULU6oru[r(S\'*6rr 
u^j5lLL|Lb i!jl6or6ii([5LDagu a[r6DOT6un:Lb. 


u^S6iigu 6 iiaa.^LU[aa 60 ) 6 n:LJ Ou^jSli^-sigiLb 0^a(g)Ui_i corpus 6T6orLJU(S\.^JD§j. 

corpus-^ 26 rr 6 n: ^ 60 ) 6 or^ 60 )^iL(Lb O's & I's LDa^gu6ii^^(g) 
dictvectorizer(), countvectorizer() ^.^lu 60 ) 6 ii ULU6oru©.^6orJD6or. 


.^L^aft 60 OTi_ a^[rg 600 T^^^ corpusl corpus2 6TguLb ©g 60 or© corpus 

Glan:©4aLJUL_©6rr6rT6or. a6rr6TT§j dictvectorizer() -4(g) ag[rg 600 TLDn:a 6 iiLb, 

©g6Dori_a6iigaa a6rr6TTgj countvectorizer() -4(g agag 600 TLDaa 6 iiLb ^ 60 )LD^gj 6 rr 6 rTgj. 
^©gggaa vector 6TguLb variable-4, corpus2-4 a6rr6rT 6iia44lLU[5ja(^4an:6or 
encode GlffLULUUULL Gl6ii4i_ija6rr ^ 60 )LD^gj 6 rr 6 rT 6 or. © 6 ii^ 60 )jd 60 ) 6 iiggj CBmi 
©g6DOT© Gl6ii4i_ija(^44l60)i_SLULun:6or euclidean distance-g ei^enn^ 
a6D0T©i!jli^uugj 6T6orgu u[rij4a6un:Lb. 


https://gist.github.com/nithyadurai87/f3fff58ab7272279ef069689fc391dec 


from sklearn.feature_extraction import DictVectorizer 
from sklearn.feature_extraction.text import CountVectorizer 
from sklearn.feature_extraction.text import TfidfVectorizer 



from sklearn.feature extraction.text 


from sklearn.metrics.pairwise import 


import HashingVectorizer 

euclidean distances 


corpusl = [{'Gender': 'Male'}?{'Gender': 'Female'}?{'Gender': 

'Transgender'}?{'Gender': 'Male'}?{'Gender': 'Female'}] 


corpus2 = ['Bird is a Peacock Bird','Peacock dances very well 

eats variety of seeds','Cumin seed was eaten by it once'] 

vectors = [[2, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0 

0 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 1 , 0 , 1 ], 


[ 0 , 0 , 0 , 0 , 0 , 1 , 0 , 1 , 1 , 0 , 0 , 0 , 1 , 1 , 0 , 0 , 0 ],[ 0 , 1 , 1 
0 , 0 , 1 , 0 , 1 , 0 , 1 , 0 , 0 , 0 , 1 , 0 ]] 


# one-hot encoding 
vl = DictVectorizer{) 

print (vl.fit_transform(corpusl).toarrayO) 
print (vl.vocabulary_) 


# bag-of-words (term frequencies, binary frequencies) 
v2 = CountVectorizer{) 


print {v2.fit_transform{corpus2).todense{)) 



print {v2.vocabulary_) 


print (TfidfVectorizerO .fit_transform{corpus2).todense{)) 

print (HashingVectorizer{n_features=6).transform{corpus2).todenseO) 

print {euclidean_distances{[vectors[0]],[vectors[1]])) 
print {euclidean_distances{[vectors[0]],[vectors[2]])) 
print {euclidean_distances{[vectors[0]],[vectors[3]])) 


1. dictvectorizer() -^([5 categorical variable-g I's & O's Loa^JD 

'Gender' CTgutb categorical variable- 6 OT 'Male', 'Female', 

Transgender' ^.^Lueoeii ^ 60 )LD^§j 6 rr 6 TT 6 OT. unique 

LD^LJi_ia 60 ) 6 TT 60 ) 6 ii^§j ^([5 dictiouary-g a([ 56 iiaa(g)Lb. OleOTeOTij 3 
6 iiaij^ 60 )^a(^Lb, ^ 6 ii^ 60 )JDLJ Glu^au 6i]l6TT[5J(g)Lb 5 5*3 dimension 

GlaaeoOTL ^([5 matrix-^a ^eiiGleuai^ 6iii5lLL[Lb 

matrix- 6 OT ^([5 row eufflu!]!^ dictionary-^ serren 

@LU)G1u^jSI([5LJLJ16OT 1 CT6OT6IlLb, ©^6ro6uGlLU«frl^ 0 CT6OT6IlLb SuUL© 
6 O) 6 ii 0 §j 4 Glaa 6 rr(^Lb. ©eiieuaSjn ^([5 GleuaLij ©gjSeu one- 

hot encoding ct6otlju(S\.^jd^. 


print (vl.fit_transform(corpusl).toarrayO) 



[[ 0 . 1 . 0 .] 


[ 1 . 0 . 0 . ] 

[0 . 0 . 1 . ] 

[0 . 1 . 0 . ] 

[ 1 . 0 . 0 .]] 


fit_transform() ct6otu§j ^ld§j corpus-g serrerfLaa CT(S\^§j4Glan:60OT(B 
Gl 6 iiai_([ 5 a(g)a to_dense() ct6otu§j 

^i_ir^^aan:6OT Gl 6 iiai_ 60 )g s^ 56 iin:a(g)Lb. _vocabulary ct6otu§j ^ld§j Gl6iiai_ij 
a([ 56 iiaaa^^^(g) a^eiilLU dictionary-ga GlaaeoOTiyi^aigiLb. 

print (vl.vocabulary_) 


{ 'Gender=Male' : 1, 'Gender=Female' : 0, 'Gender=Transgender' : 2} 


2. countvectorizerO - GlaaQ-saiJULL eiiaa-^liLiraaerr ^ 6 O) 6 OT 06 O)^LL(Lb I's & O's - 
LDa^guLD. ^LDgj 4 17 ^«frl^§j6ii euair^ero^ai^Lb 

aerrerreOT. &t6otS6ii 4*17 dimension GlaaeociL matrix si^euaaauuL^errerrgj. 6 ot 
^eiiGleuai^ einJliijlg^Lb euair^ero^ ©LiiGlu^auerreiTG^a 1 CTsaeiiLb, 

^LiiGluJiia^ 6 iiair^ 60 )^ 0 CTeOTeiiii aaeocTeuaib. ©gjGeu bag of 

words CT6OTLJU(5i'^iD§J. 


6 iiaij^ 6 O) 0 a 6 rr euffleroffulil^^aeOT encode GlffLULUuuLLeroeu 

^Lli GlU^jSl([5a(g)Lb CT6OTa arJD (J^lyLUagJ. 6Iiaij^60)^a6ffl^ ^61:611 CT^6Ua 
CT(i^^§jaa60)6TTU4Lb, CT(i^0§jaa6TTaa LDa^jSleijlL© ^^6ro6OT tokens-^<s 

LDa^auii). Tokenization ct6otu§j ©g60OT(5i'S(g)Lb Gld^ull ^(i^^gjaaeroeTTLJ 
Glu^jSl([5a(g)Lb 6iiao'^60)^a60)6TT ©60)i_Gl6ii6fii 60)6ii^§ju tokens-^a 

Loa^aueuG^ Tokens ct6otu§j Gaaufllguerr ©Lii Glu^auerrerr euaij^eo^aerr 



Bird is a Peacock Bird'/Peacock dances very well','It eats variety of 
seeds','Cumin seed was eaten by it once' 

print {v2.fit_transform{corpus2).todense{)) 

[[2 0000010001000000 ] 

[0 001000000100010 1 ] 

[0 000010110001100 0 ] 

[ 01101001010100010 ]] 


binary frequency LD^guii term frequency CT6OTgi]Lb ©tjeoOT© 
(g)j51uLjli_6un:Lb. binary ct6otu§j Gleuguii I's & O's -g LDL_(S\ii> Gl6ii6frlLJu(S\^§jii. term 
CT6OTU§j ^6ijGl6ii[r([5 6iin:rr^60)^LL[Lb qperoJD ©LLbClu^guerrengj ct6otu60)^ 

Gl6ii6frlLJu(5l^§jii. Bird ct6otu§j 6iin:a.^LU^^^ ©ijeoOT© qperoJD 
aerrerr^n:^ 2 Gl6ii6frlLJu(5l^^uuL_(5l6rr6TT6O)0a aneocreiiLb. 


vocabulary-^ ctIB'S.suull 17 ^«ffl^§j6ii 

eiinrr^ero^aerr arreocreiiLb (0 16 eiieog). @[a(g) Bird, 

Peacock, it ^.^lu ©geoOT© (jp) 60 )JD ©LLDGlu^guerrengj. ^6OTn:^ 

^([5 qpeoJD LDL_(S\ii ^a 6 OT @[ 5 J(g) SffL 6 laauuL_( 5 l 6 rr 6 TT§j. ^eiieunSjD case-sensitive 
@^ 6 un:LD^ it. It ^.^Lu ©g60OT(5lii> @» 6 OTJDn:a CT(S\^§j4Glan:6rr6TTUUL_(5l6rr6TT§j. Sld^ld 
a CT 6 OTU§j ^([5 6 iin:ij^ 60 )^LU[ra CT( 5 l^§j'*Gl'Sn: 6 rr 6 TTUui_ 6 ijl^ 60 ) 6 u. 

print {v2.vocabulary_) 


{'bird': 0, 'is': 6, 'peacock': 10, 'dances': 3, 'very': 14, 'well': 

16, 'it': 7 

, 'eats': 5, 'variety': 13, 'of: 8, 'seeds': 12, 'cumin': 2, 

'seed': 11, 'was': 



15, 'eaten': 4, 'by': 1, 'once': 9} 


3 . TfidfVectorizer() - Term frequency Gl6iiai_60)[j 

normalize Glffiugj frequency -aanem: weight-g Gl6ii6ffluu(5l^§Jii>. Oeuguii 
raw count-^a 2 ct6ot Gl6ii6ffluu(S\^^n:LD^, ^06O)6 ot normalize Glffiugj 
Gl 6 ii 6 fflLJu( 5 l^§J 6 iiS^ L 2 Normalization (level 2 ) CT 6 OTLJu( 5 iii. 


print (TfidfVectorizerO .fit_transform{corpus2).todense{)) 


0.84352956 

0 . 

0 . 

0 . 

0 . 

0 . 

0.42176478 

0 . 

0 . 

0 . 

0.3325242 

0 . 

0 . 

0 . 

0 . 

0 . 

0. ] 

1 

0 . 

0 . 

0 . 

0.52547275 

0 . 

0 . 

0 . 

0 . 

0 . 

0 . 

0.41428875 

0 . 

0 . 

0 . 

0.52547275 

0 . 

0.52547275] 

1 

0 . 

0 . 

0 . 

0 . 

0 . 

0.46516193 

0 . 

0.36673901 

0.46516193 

0 . 

0 . 

0 . 

0.46516193 

0.46516193 

0 . 

0 . 

0. ] 

1 

0 . 

0.38861429 

0.38861429 

0 . 

0.38861429 

0 . 

0 . 

0.30638797 

0 . 

0.38861429 

0 . 

0.38861429 


0 . 0 . 0 . 


0.38861429 0. 



4 . HashingVectorizer() - ^aijrr^ujleOT gjeroeocT ©^euaLDSeuSiu 
Gl6iiai_60)g a([56iin:a(g)Lb.. Sm^aeocTL diet & count ^.^lu ©g 60 OT(S\ii 2 ui^aefii^ 
S6ii60)6u GlffLLJLL(Lb. Gl 6 iiai_ij 2([56iin:aa^^^(g)0 S^60)6iiLun:6OT dictionary- 

60)LU a([56iia4(g)Lb. ^(S\^^uiyLuaa^0a6OT Gl6iiai_60)g a([56iiaa(g)Lb. 
uiy60)LU^ ScBtriyLuna Gl6iiai_60)g ^(^eiina^eiiero^^^neOT Hashing Trick 

CTeOTSunii. 6jGl6OT«fii^ dictionary- 6 OT ^ 6 tt 6 ii ^^06TT6iia(g)LJ Gluffliu 

^agn^eoLU Ss^iBlaa^ S^60)6iiLun:6OT memory-eOT ^eneiiLb ^^affl 4 (g)Lb. © 6 ®^^ 
^eiilijLJU^^ana ©eiieueoaiLineOT Gl6iiai_ij 


print (HashingVectorizer {n_features=6) . transform {corpus2) .todenseO) 


[[ 0. -0.70710678 -0.70710678 0. 0. 0. 

] 


[ 0. 0. -0.81649658 -0.40824829 0.40824829 0. 

] 


[ 0.75592895 0. -0.37796447 0. -0.37796447 

-0.37796447] 


[ 0.25819889 0.77459667 0. -0.51639778 0. 

0.25819889]] 


5 . euclidean_distances - encode GlffLULuuuLL ©geo®© 

6iin:a.^LU[aa(^a.^6®i_SLULun:6OT Seiigyurr© ^6TT6iia(g) aerrengj CT6OTU6®0a 

206iiLb. Sm^aeocTL a^ngeocr^^^ ©geo®© 6ii[ra.^LU[aa(^a(g) 

^6®i_SLULun:6OT Seiigyurr© (g)6®JD6iin:a6iiLb, 3-6ii§j 

6iin:a.^LU^§ja(g)LDn:6OT Seiigyun© ^^-SLonaeiiLb, 4-6ii§j 

6iin:a.^LU^§ja(g)LDn:6OT Seiigyun© ©6OTgi]Lb ^^amnaeiiLb ©g5LJU6®^a 

aneocTeunii). 


print {euclidean_distances{[vectors[0]],[vectors[1]])) 



[ [2.82842712]] 


print {euclidean_distances{[vectors[0]],[vectors[2]])) 
[ [3.31662479]] 

print {euclidean_distances{[vectors[0]],[vectors[3]])) 
[ [3.60555128]] 



Natural Language Toolkit 


@§j6ii60)[j [Bmi aeocTL Gl6ii4i_g ^(^eiirraaLb ejS^guii ^iJlijeoOT© 

eurrij^ew^aerr ldlQSld ©LLbGlu^jSli^^^rrg^LD <9t2_i_, ©Lii Glujiin:^ 
6ii[rij^6O)0a(^aan:6OT O's g GlarreoOTi^i^a^Lb. ©^6OTn:^ G16 ii 4 i_([ 560 )i_lu 

^6rT6ii ©gjSurreOTJD ^^rreiileurreOT O's -gu Glu^gu eijl^rTra^ii 

Gl6iiai_ij^n:6OT sparse vector CT6OTgu ^eo^aauuQ'^njgJ. a^ag60OT^§ja(g) ^([5 
SaauLjlguerr ^IsfrliDa, eiHewemLiaL© SuaeijrJD u^Seu^ gjeoJDai^aaaem: 

eiiaa-^liLiKjaerr aerreiTGl^sfrl^, ^eu^eoJuGliLi^euaLb ^^5 Gl6ii4i_gaa LDa^^ii Suagj 
^g^liLig^aaaeOT eiiijlujl^ <#l«fflLDa6iiaaa6OT euaij^eo)^ ©LiiGlu^jSli^aaagj, 
^S^Sua^ -^sffliDaeiiaaaeOT eiiijlujl^ 6i]l60)6aLuaL_(S\'*'*a6OT euaij^eo^ 
©Liidu^jSli^aaagj. ©S^Sua^ uaij^^a^ ^eijGleiiai^ eiiiJliillg^Lb S^60)6iiLi!jl^6ua^ 
U6U O’s (gl60)JD^^([54(g)Lb. ©^6OTa^ 2 (J^a^LUU Ljlg.?ff60)6OTa6a CT(l^.^l6OTJD6OT. 


(j^^euaeu^aa ^eaeii memory & space effeocTa-^JDgj. Numpy ct 6 otu§j O’s 
^^ 6 ua^ 6 ii^ 60 )JD LDL_(S\Lb (gijSlufllQeii^^aaa ^(^^leu ^Ijdlji_i eueoa ^geii eiiewaaeoga 
6 ii^[a(g).£l 6 OTJD 6 OT. dimensionality- 6 OT ^eaeii 

^6a6iia(g)U uiijl^^l S^eoeuLuaeOT ^geiiaeffleOT &T60OT6D^.a60)aL4Lb 

©^60)6uOLU«frl^ overfit ^eu^^aaem: ^uaLuii aerrerrgj. ©gjSeu 
'curse of dimensionality' ^^eugj 'Hughes effect' CT6OTJD60)^aauu(S\.^JD§j. @60)^ 
CTeiieuaau (gieoJDUugj ct 6 otu SuaeuS^ dimensionality reduction 


CBLDgj Gl6ii4i_ij a([ 56 iiaaa^^ 6 OTSua§j stop_words='english' CTsaa 
Glaa(5l^S^aLDa6OTa^ is,was,are SuaeOTJD 6 II([ 5 ^ 6 otjd gjeweocTff 

Glffa^a 60 ) 6 TT CT^euaii ^eiilo'^gj 15^(^^61:611 Glffa^a(^a(g) LDL_(5lii dictionary 
S(^6iia4auu(5lii. @^6OTa^ ^^6 ot dimensionality (g) 60 )JD^JD§j. 


^ 6 ij 6 iiaSjD NLTK 6 TgULb a(^ 6 ijl ^ 6 (jrau a 6 a 6 a§j. ^^g^ 6 a 6 n: stemmer, lemmatizer 
^.^Ilu6ii^60)jdu ulu 6 otu(S\^§J 6 ii^ 6 ot Qy) 6 ULb Gl 6 iiai_iJl 6 OT dimensionality © 6 OTguLb 
(g)60)JDaaLJu(S\6ii60)^a aa600T6uaLb. 


https://gist.github.eom/nithyadurai 87 / 491 e 5 e 6 f 9 c 009 ebd 88912 e 71 ef 9363 a 4 


f? f? f? 


import nltk 

nltk.download{) 


I? f? I? 


from sklearn.feature_extraction.text import CountVectorizer 

from nltk import word_tokenize 

from nltk.stem import PorterStemmer 

from nltk.stem.wordnet import WordNetLemmatizer 

from nltk import pos_tag 

def lemmatize(token, tag): 
if tag[0].lower{) in ['n', 'v']: 

return WordNetLemmatizer{).lemmatize(token, tag[0].lower ()) 


return token 



corpus = ['Bird is a Peacock BirdPeacock dances very well','It 
eats variety of seeds','Cumin seed was eaten by it once'] 


print (CountVectorizer {) . fit_transform (corpus) .todenseO) 
print 

(CountVectorizer(stop_words='english').fit_transform(corpus).todense()) 

print (PorterStemmer().stem('seeds')) 

print (WordNetLemmatizer().lemmatize('gathering', 'v')) 

print (WordNetLemmatizer().lemmatize('gathering', 'n')) 

s_lines=[] 

for document in corpus: 
s_words=[] 

for token in word_tokenize(document): 

s_words.append(PorterStemmer().stem(token)) 
s_lines.append(s_words) 
print ('Stemmeds_lines) 



tagged_corpus=[] 


for document in corpus: 

tagged_corpus.append{pos_tag{word_tokenize(document))) 

l_lines=[] 

for document in tagged_corpus: 
l_words=[] 

for token, tag in document: 

l_words.append(lemmatize(token, tag)) 
l_lines.append(l_words) 
print ('Lemmatized:',l_lines) 


@^60)6OT i!jl6OT6ii(^LDn:gu u^eijl/naaii) Glffitigj ULueOTuQ^^eurrib. 


import nltk 



nltk.download{) 


'Bird is a Peacock Bird','Peacock dances very well','It eats variety of seeds'. 


'Cumin seed was eaten by it once' 


1. 6iin:a.^lLU[aa(^.san:6OT CountVectorizer() ct 6 otu§j ^([5 

Gt6ii.si_60)rr 2 (_[ 56 iin:.S(g)Lb (4*17). 


print (CountVectorizer {) . fit_transform (corpus) .todenseO) 


[[2 0000010001000000 ] 

[0 001000000100010 1 ] 

[0 000010110001100 0 ] 


[01101001010100010]] 


6iin:4.^LU[5ja(^a(g) stop_words='english' CTsyra Glan:(B^§J 
Gl6iiai_ij 2([56iin:a(g)LbSun:§j, 

is, very, well, it, of, was, by, once ^.^lu eunij^ero^aerr [§aaLJu(5l6ii^n:^ 
dimensionality 

(g)60)JD^§j @([5uu60)^a aneocTeunii) (4*9). 


print 

(CountVectorizer{stop_words='english').fit_transform(corpus).todense{)) 
[[ 200001000 ] 

[0 0100100 0 ] 

[0 0001001 1 ] 

[ 010100100 ]] 


2. stop_words='english' ULU6OTu(5l^^6OTn:^Lb an_i_, seeds, seed ^.^lu 
@g60OT(5lii @rr60OT(S\ SffL6laauu(S\'^l6OTJD6OT. 

^efilrruu^^ana PorterStemmer() 

S6IlijGlffn:^60)6U aeOOTLjSl^gJ ^60)^ LDL_(S\li> SffL6la(g)Lb. ^6®^^ ^(Lg6i]l 6II(^.^l6OTJD 
@6OT6OT Qjd Glffn:^a6®6TTGlLU^6un:Lb SsiBlaangj. 


print (PorterStemmer{).stem{'seeds')) 


seed 



3. WordNetLemmatizer() ct 6 otu§j ^([5 Gluai^errjSl^gj 

LjliJl^gj SffL 6 la(g)Lb. ^([5 GluLuiiffGlffa^euaaeiiLb 

LD^GljDa([5 6i]l60)6OTGlffa^6Uaa6Illi) ULU 6 OTU(S\^^LJUL_ 1 ^([ 5 U|!j 16 OT ^60)6II 

^g60OT60)i_LL(Lb ©geoOT© 'I 

am gathering foods for birds', 'seeds are stored in the gathering place' ct6otu^^ 
gathering, gather ct6otu§j ©geocr© 6 iin:ij^ 60 )^a 6 TTn:a SffL6laauu(S\Lb. 


print (WordNetLemmatizer{).lemmatize{'gathering', 'v')) 


gather 


print (WordNetLemmatizer{).lemmatize{'gathering', 'n')) 


gathering 


4. [BLb(Lp 60 )i_LU corpus-g NLTK GlaneoOT© ^^(gibSungj, i!jl6OT6ii(^LDn:gu 

Gl6ii6frlLJu(5l^§Jii. 


print {'Stemmeds_lines) 


Stemmed: [['bird', 'is', 'a', 'peacock', 'bird'], ['peacock', 

'danc', 'veri', 'w 


ell'], ['It', 'eat', 'varieti', 'of, 'seed'], ['cumin', 'seed', 
'wa', 'eaten', 



by', 'it 


one' ] ] 


print {'Lemmatized:',l_lines) 

Lemmatized: [ [ ' Bird', 'be', 'a', 'Peacock', 'Bird'], ['Peacock', 

'dance', 'very', 'well'], 

['It', 'eat', 'variety', 'of, 'seed'], ['Cumin', 'seed', 'be', 
'eat', 'by', 'it', 'once']] 



Decision Trees & Random Forest 


Regression LD^guii Classification ©rr6DOTi^^(g)Lb 2 ^ 6 iiaan_i^LU ScB’rjSan:© (J^60 )jdlij 1^ 
ulijlaa ©Lueun:^ non-linear model-^a decision trees ld^fuld 

random forest Decision trees &t6otu§j Glungjeuna LDcr^ffl^ 

26rr6TT LD^LJi_ia60)6n:a Glaneocr© ^6 ii^60)jd U(g^a6n:n:aLJ 

ulifl^gja a^-^JDgj. -^L^-saeoOTL CT(S\^§j4an:L_i^^ ^([5 LD 6 uij LD^dilLun:, Sgrr^neun:, 

^[rLDeogiLin: CT6(jrgu ^giDnsfrl-sa DecisionTreeClassifier() 
RandomForestClassifier() ULU6OTu(5i^^uuL_(5i6rr6n:6OT. ^eiiGleun:!^ Loeuffleijr 
@^i^ft(^ 60 )i_iij(sepal) cfen ^ff>6U(j^Lb, ^6 ii^jS16ot Sld^i_ijd @^i^ff>(^ 60 )i_LU(petal) 
[fgTT ^a 6 U(j^LDn: 6 OT 4 ^([5 LDeuij Loeugna ct6otu60)^^ 

^gLDn:«lrlft.£ljD§j. ^LDS^raaefilg^errgn: ^geiiaeogn u^Seugu U(g)^a6n:n:aLJ 
ij1i)1^§j 4 ff>^(g)Lb S 6 ii 60 ) 6 U 60 )LU DecisionTreeClassifier() Gls^LU-^JDgj. 


^eiieungu ^geiiaeognu iljlffluugj ct6otu§j conditions-gu Oungu^gJ 

[Bi_<s.£Ijd§j. &T6OTS6ii^n:6OT © 60)611 Eager learners 6T6orgu ^60)^aauu(5l.^6orJD6or. 
©^^(g) LDfT^JDrra KNN 6T6orugj lazy learners ^(gii. Ensemble learning 6TguLb 
(j^ 60 )JDiijl^ random forest a^-^JDgj. Ensemble 6T6orJDn:^ (g(i^LDLb 6T6orgu Glun:(g6rr. 
^gn:6iigj u^S6iigu decision trees-g 2 (g 6 iin:a.^, ^ 6 ii^ 60 )jd (g(i^LDLDn:a 60 ) 6 iiggj.s 
a^-^JDgj. (g)(i^LDg^^ 26 rr 6 n: ^6ij06iin:(g tree-ii 06ii6ijS6iigu uujl^^lg gg 6 iiff> 60 ) 6 n: 
6T©ggj.s Glan:6DOT© Glu^gu-s Gt<sn:6rr.^ljDgj. 6T6orS6ii ©ggu 60 )i_LU accuracy 

©6orguLb ©(ga^ii. .^i^.sa600Ti_ 6T©ggj4an:L_iy^ ©60)6iia(^aan:6or 

(glg 60 ) 6 ua an: 600 T 6 un:Lb. Decision Trees 89% accuracy -gLL(Lb, Random forest 97% 
accuracy -^11410 O 6 ii 6 frluu©ggj 6 ii 60 )g.s an: 6 D 0 T 6 un:Lb. SiDg^ii ^6ijO6iin:60rguu3 CT6ij6iin:gu 
gg6iia60)6rrLJ ililfflggja <s^<^JDgj 6r6orugj 6ii60)gui_LDn:a6iiLb an:L_i_LJUL_©6rr6n:gj. 


https://gist.github.com/nithyadurai87/d21ffb25b7f5a38d90a437e9fl69d58e 




from sklearn.datasets import load_iris 


import pandas as pd 
import os 

from sklearn.tree import DecisionTreeClassifier,export_graphviz 
from sklearn.metrics import 

confusion_matrix,accuracy_score,classification_report 
from io import StringlO 
import pydotplus 

from sklearn.model_selection import train_test_split 
from sklearn.ensemble import RandomForestClassifier 
from IPython.display import Image 
import matplotlib.pyplot as pit 
import seaborn as sns 

df = pd.read_csv{'./flowers.CSV') 

X = df[list(df.columns)[:-!]] 

y = df['Flower'] 

X_train, X_test, y_train, y_test = train_test_split(X, y, 
random state = 0) 



a = DecisionTreeClassifier(criterion = "entropy", random_state 
100,max_depth=3, min_samples_leaf=5) # gini 

a. fit{X_train, y_train) 
y_pred = a.predict{X_test) 

print {"Confusion Matrix: ", confusion_matrix{y_test, y_pred)) 
print ("Accuracy : ", accuracy_score(y_test,y_pred)*100) 
print("Report : ", classification_report(y_test, y_pred)) 

dot_data = StringlOO 

export_graphviz(a, out_file=dot_data,filled=True, 
rounded=True,special_characters=True) 

graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) 

Image(graph.create_png()) 

graph.write_png("decisiontree.png") 

b = RandomForestClassifier(max_depth = None, n_estimators=100) 

b. fit(X_train,y_train) 
y_pred = b.predict(X_test) 


print ("Confusion Matrix: 


, confusion matrix(y test, y pred)) 



print ("Accuracy : ", accuracy_score{y_test,y_pred)*100) 
print{"Report : ", classification_report{y_test, y_pred)) 


export_graphviz(b.estimators_[5], out_file='tree.dot', feature_names 
= X_train.columns.tolist{) , 


class_names = ['Lotus', 'Jasmin', 'Rose'], 
rounded = True, proportion = False, precision = 2, 

filled = True) 


os.system ("dot -Tpng tree.dot -o randomforest.png -Gdpi=600") 

Image{filename = 'randomforest.png') 

f = 

pd.Series(b.feature_importances_,index=X_train.columns.tolist{)).sort 

sns.barplot {x=f, y=f.index) 

pit.xlabel{'Feature Importance Score') 

pit.ylabel{'Features' ) 

pit.legend{) 


pit.show{) 



□□□□□□□□□□□□□□□□□□: 


flowers.CSV ct^uld GlLDn^^ib 150 ^geiiaerr ULijl^#la(g) ^60)6ii 

train_test_split() ct^uld (j^eoJDUui^ 112 ^geiiaerr ULijl^^l.S(g)Lb, 15^ 38 ^geiiaerr 

uiul^^l OffLULUUULL model-g Sffn:^UU^^(g)Lb ULU6OTU(5l^^UUL_(5l6rr6TT6OT. 

-^L^aaeoOTL decision tree-6OT node-<S(g)6rr aerrerr samples=112 ct6otu§j 
OLD a^^ii) ULijl^^l.S(g) ^6fflaaLJUL_(5i6rr6TT (g)j51.s.^JD§j. value = 

[34,41,37] CT6(jru§j 34 ^geiiaerr iD^dHeoa-sigiLb 41 ^geiiaefr ^m_D60)ga(g)Lb, 37 
^geiiaerr Sga^aeii-s^ii) ^eoLD^gjgrrgrreOT CTguib 6iil6iig^60)^.s Glaa(5i'*'^|D§J. 
entropy = 1.581 Lon^fflaefrl^ agrren uncertainty / disorder / impurity-g<s 
(g)j51a.^JD§j. ^^rreiigj CBmi 6ii6O)aLJu(5i^0 S6 ii60oti^lu u^Seugu iljlffleiiagfri^ aerrerr 
^gneii aeu^^jerrengOT 6T6 otu60)^.s an_guu3- ©^^l-sagOT 

ff>6DOTa.^(5i i!jl6OT6ii(_[5Lb (j^60)jdlij 1^ (£la(i^Lb. OiDn:^^ ^eiiOeiin:!^ 

ijli 5 i 60 ) 6 iiff ^ggiiai^ii) grengugrreii greDOTeorfflagoaiijl^ agrreneOT giguii iljlgOTgOTii 

ff>gD0Ta.^i_LJu(5iii. OlgOTgOTij ^LbLD^LJi_i4(g) log base 2 agoOT^Oliysa SgiigDOTiJlLb. 
©^^angOT ff>(_[ 5 gljl https://www.miniwebtool.com/log-base-2-calculator/ giguii 
gugogu^^gn^^^ agfrgngj. ©ghgurrSjD Sga^rr, ^[riDgog gigOTguii ^giiGlguni^ 

ijli51gii<a(g)Lb ^«frl^^«frlLun:ff>.s agoOT(S\i!jli^aa SgiigDOT(5lii. <sgo)i_.#lLun:a ©goguagfilgOT 
<9^L_(5\^Gl^n:go)ago)LU - grguLD gT^ijLDgoJD (giSlLun:^ Oui^a-^lgOTn:^ -^goLuuS^ 
entropy 


Entropy = - {Summation of (fraction of each class.log base 2 of that fraction)} 
= -{ (34/112).log2(34/112) + (41/112).log2(41/112) + (37/112).log2(37/112) } 
= -{ (0.3035).log2(0.3035) + (0.3661).log2(0.3661) + (0.3303).log2(0.3303) } 
= -{ (0.3035).(-1.7202) + (0.3661).(-1.4496) + (0.3303).(-1.5981) } 

= -{ -0.5220 + -0.5307 + -0.5278 } 

= -{ -1.5805} 

= 1.581 





aeDCTa-^LUULL entropy ld^lj 60 )uSlu 6 ii 60 )guui_^^ 6 (jr node-^ aneDOTeunii. 
@iiLD^Lji_i 0-<a(g) ©(^uu^n:^, 112 ^([5 condition Qy) 6 ULb 37, 

75 CTguLb CT6DOT6orfrla60)aLijl^ ^60)LDiL(Lb ©^5 fllffleiiaermau i!jlfflftaLJu(5i'^6(jrJD6OT. 
^^rreiigj X2 CT6OTLJU(S\Lb Petal_length LD^LJi_ia6fii^ 2.35 -<S(g) 

^geiiaen ©i_lji_ijd node-g^ii, agrrgneoeii 6 ii6uui_ijd 

node-g^ii i!jli5i<ss5uu(S\.£l6OTJD6OT. OleOTeOTij LffecOTiSlLb i!jlffl.sauuL_i_ ©^5 i!jlffl 6 iia(^.S(g)Lb 
entropy a 6 D 0 Ta.^i_LJU(S\.^JD§j. ©i_LJi_iJDLb agrren node-^ entropy 0.0 grem: 
gu^gjgngngj. ©gjSgu decision node gTgOTLJu(5iii. ^^rrgugj O-^a 

agrrgn ^ggiiagrr ^gogOT^gjii gjS^n: ^^5 gugoaujlgOT 
ijli5i<s<suuL_(S\giilL_i_§j gTgOTgu ^^gOT value LD^LJi_iLb [0,0,37] grgOTgu 

agrrgngj. ^^ngiigj LD^d3lgo)a.S(g)Lb, ^n:LDgo)g.S(g)LDn:gOT ^ggiiagfilgOT gigoOTgorfrlagoa 0. 







































S[jn:^n:6iiaan:6OT CT60OT6®ffla60)a 37. ©gjSeu ^([5 y,60)6ii Sgrr^n: ct6ot (^^19.611 
GlffLLJ 6 ii^^an: 6 OT decision node (J^ 60 )jdlij 1 ^ eueroguuL^^g^erreTT ld^jd 

nodes 2 ([ 56 ii[raauu(S\'^i 6 OTJD 6 OT. ld^jd features-ii Sffn:^aauu(5i'^6OTJD6OT. 

6II60)gLJUI_^^6OT a60)l_.#l .^60)6TTUjl^ ^([5 ^&S)6H LD^d)! ^^6U§J ^[TLDerog CT6OT 
GlffLLJ 6 ii^^an: 6 OT decision nodes ^60)LD^§j6rr6TT6OT. a60)i_#l .^160)6 ttuj1^ 

@i_l 61 ([ 5 ^§j 6 ii 6 ULDn:a serren 3 nodes-^, value LD^LJi_ia60)6TT aeusfrlaaeiiLb. 
LD^dJl CT6OT (LPI9.6II GlffLLJ6II0^an:6OT 34 CT6OT GlLDn^^LDUa ©^eUULD^, 30 , 3 , 

1 CT 6 OT ^sfrl^^sfrlLunau decision nodes-g S([56iin:a.^lLL(6rr6TT§j. 

^eiieunSjD eueuiBli^^gj ©LLona aerren 3 nodes-^, ^[rmerog ct6ot (^19.611 
GlffLLJ6ii^^an:6OT 41 &t6ot GlLDn^^LDcra ©^euniD^, 30 , 8, 3 ct6ot^ 

^sfrl^^sfrlLunaLJ ai^euna-^LLierreTTgj. CT 6 OTS 6 ii^n: 6 OT ©eoeuaefileOT entropy 0 

Gi[B( 15 [a<^lLU LD^uurra aerrerrgj. 


Information Gain; 


^([5 (g)j51uLjlL_i_ ^geiiaeroerr 6ii6O)aLJu(5l0§J6ii0^(g)0 S^ 60 ) 6 iiLun: 6 OT 

6iil6iig[aa60)6TT 61^0 ^ 6 TT 6 iia(g) ^([5 feature ^effla-^JDgj ct6otuS^ Information Gain 
CT6OTUu(5lii>. @§j6iiLb entropy-gu SuneOTSjD ffffliLina 

^([5 metric ^(giib. entropy ct6otu§j impurity @60)^ eoeu^gj, 

impurity-g<s (g)60)JDLJU^^(g) metric ^n: 6 OT gini gain ct6otuu(51u). 

6iin:LLJLJun:(S\ flleOTeui^Longu. 


Information Gain = Parent's entropy - child's entropy with weighted average 


child's entropy with weighted average = [(no. of examples in left child node) / 
(total no. of examples in parent node) * (entropy of left node)] + 

[(no. of examples in right child node)/ (total no. of examples in parent node) * 
(entropy of right node)] 



= (37/112)*0.0 + (75/112)*0.994 
= 0 + 0.665625 
= 0.665 


DecisionTreeClassifier()-a(g)6rr criterion = "entropy" 
CT6OTU^^(g) u^eurra "gini"CT6OTa Oan:(S\^§j gini-ga 

flleOTeui^Longu -^leogrraeogn 2([56iin:a.^.s 



Random Forest (j^60)jdlij 1^ a^(g)Lb model-6OT eueoguLib OleOTeui^LDagu. 



























Features 


Random forest-^ Lorr^ffl^ aerr^rr ^eiiGleurri^ feature-ii 

6II6O)aLJU(S\^0g^'*(g) ^6rT6Il4(g) U[aa6ffl^§J6rr6rT§J CT6OTU60)^ L]l6(jr6II([5Lb 

6 ii 60 )gLJUi_^^^ aaeocreumi.. 



Feature importance Score 





Clustering with K-Means 


Unsupervised learning-^ algorithm ©gjSeu. ©gjeueog 

[Bmi aeoOTL ^eoeOT^gjii supervised-6OT ^ 60 )LDiL(Lb. logistic regression, multi¬ 
class classification SuneOTJD 26 rr 6 Tf( 5 l(X) Gi6ii6frlLij(S\(Y) 

@g6DOT60)i_iL(Lb ulijI^# 1 ^gfrluSunib. u^Seugu OeuefilLiJL© eueoaagfileOT 

^geiiaeognu Qlijluu^^ig) ^^^60)6ot eueoaiLirreOT &T^ 60 ) 6 Uff> 60 ) 6 mL(Lb CBaSiD 
6 ii 60 )gLU 60 )JD GlffLuSeumi. unsupervised-^ Oeuguii aerrerflB'Serr ldl_( 51 Sld 

Gtftn:(5l'S<SLJu(5lii. ct^^60)6ot eueoaujl^ fllffl-sa SeueDOTiJlLb CTeOTuS^n, ^6 ii^jS16ot 
CT^ 60 ) 6 ua 6 rr &t6OT6ot CT6(jruS^n: Glan:(5l'a'Suui_n:§j. SuneOTJD clustering-^ 
CT^ 60 ) 6 ua 6 rr K-means Qy)6ULDn:a a6DOTa.^i_LJu(5l'^6OTJD6OT. CTeiieueneii eueoaagfrl^ 
ulijlaa S6ii6DOT(5lii CT6(jru60)^ elbow method-Qy) 6 ULb aeocT-s-^Leunii). ^^neiigj ^^5 
6ii60)gLU60)JD60)Lu4 Oarr^g^eiigj supervised CT6OTJDn:^, CTeiieiil^ 

6 ii 60 )gLU 60 )JDiL(Lb ©^euuLD^ a^aa Oan^g^eiigj unsupervised 


.^L^aaeoOTL XI, X2 ct^uld ©geoOT© ^Lba[aa6rr(features) 

Gtan:(5l'SaLJUL_(S\6rr6n:6OT. Y CT6OTgu CT^jeiiii @^60)6U. ^^neiigj Oeuguii 
26 rr 6 TfL_( 5 l'San: 6 OT ^geiiaeoena OarreDOT© ldlQSld [BfiLDnaSeu u^Seugu ^(i^aagfrl^ 
^ 6 ii^ 60 )JD 6 ii 60 )aLJU(S\^^a Glan:(S\aa S 6 ii 60 OT(S\Lb. 


xl = [15, 19, 15, 5, 13, 17, 15, 12, 8, 6, 9, 13] 
x2 = [13, 16, 17, 6, 17, 14, 15, 13, 7, 6, 10, 12] 


@^^an:6OT eiileTTaaLb i!jl6OT6ii(_[5LDn:gu. 


https://gist.github.eom/nithyadurai87/185e332ebce7028af265adbe86db40d5 




import matplotlib.pyplot as pit 


import math 

def plots{clusterl_xl,clusterl_x2,cluster2_xl,cluster2_x2) 
pit.figure{) 

pit.plot {clusterl_xl,clusterl_x2, ' . ') 
pit.plot{cluster2_xl,cluster2_x2,'*') 
pit.grid(True) 
pit.show{) 

def roundl{cl_xl,cl_x2,c2_xl,c2_x2): 
clusterl_xl = [] 
clusterl_x2 = [] 
cluster2_xl = [] 
cluster2 x2 = [] 


for i,j in zip{xl,x2): 



a = math.sqrt{{{i-cl_xl)**2 + {j-cl_x2)**2)) 
b = math.sqrt{{{i-c2_xl)**2 + {j-c2_x2)**2)) 

if a < b: 

clusterl_xl.append(i) 
clusterl_x2.append{j) 

else: 

cluster2_xl.append(i) 
cluster2_x2.append{j) 

plots{clusterl_xl,clusterl_x2,cluster2_xl,cluster2 

cl_xl = sum{clusterl_xl)/len{clusterl_xl) 
cl_x2 = sum{clusterl_x2)/len{clusterl_x2) 
c2_xl = sum{cluster2_xl)/len{cluster2_xl) 
c2 x2 = sum{cluster2 x2)/len{cluster2 x2) 


round2 (cl xl,cl x2,c2 xl,c2 x2) 



def round2{cl_xl,cl_x2,c2_xl,c2_x2) 
clusterl_xl = [] 
clusterl_x2 = [] 
cluster2_xl = [] 
cluster2_x2 = [] 

for i,j in zip{xl,x2): 
c = math.sqrt{{{i-cl_xl)**2 + 
d = math.sqrt{{{i-c2_xl)**2 + 

if c < d: 

clusterl_xl.append(i) 
clusterl_x2.append{j) 

else: 

cluster2_xl.append(i) 
cluster2_x2.append {j) 


{j-cl_x2) ** 2 )) 
{j-c2_x2)**2)) 


plots{clusterl xl,clusterl x2,cluster2 xl,cluster2 x2) 
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plots(xl,x2, [ ], [ ]) 


roundl(xl[4],x2[4],xl[10],x2 [ 10]) 


Xl, X2 CTguii) ©[jeoOT© CTeiieurrgu ^60)LD^§j6rr6TT6OT ct6otu60)^ 

scatter plot Qy)6ULb aaeocTeuaii). ©eOTguii ©geoOTLaeugj ct6otG16OT6OT6ot 

^Lbff[aa60)6TT ^60)LDaa S6II60OT(5lli> CT6OTU§J a60OTI_j5lLUUUI_6ijl^60)6U. CT6 otS6II ^60)611 
aadjlu uLi^Lueuaa ^g)iLJULJuQ.^6OTjn6OT. 


plots(xl,x2,[],[]) 



16 


14 

12 

10 

8 

6 

4 



6 8 10 12 14 16 18 

















Centroids (□□□□□□□□□□□ □□□□□□) 


©[j 60 OT(S\ clusters-g a([ 56 iin:a(g) 6 ii^^(g) ©geoOT© 

CT60(jTa60)6rTLLiLb, X2-d}l([5^§j ©geocT© CT60OTa60)6mL(Lb random-^a S^ijeii GlffiULU 
S6ii60(jT(S\Lb. Glaa^§j4(g) IB-giLiii, X2-d}l(_[5^§j 17-giL(Lb S^ijeii 

GlffLLj§j6rrS6rTaLb. ^eiieuaSjD ©geoOTLaeugj Glaa^§j4(g) Xl-d}l([5^§j B-giLiii, X2- 
^(i5^§J lO-giLiii S^rreii GlffLLj§j6rrS6rTaLb. ©eweiiSiLi ^6®fflLJi_i4aa6OT I_i6rr6ffla6rr 
(centroids) CT6OTJD60)^4aLJu(5i'^6(jrJD6OT. ^^rreugj © 6 ii^ 60 )jd ^i^uu 60 )i_Lun:a eweii^S^ 
^60)6OT^60)^LLiLb CBaib ©g60OT(S\ Glarr^^rrau liliJlaaLJ Surr-^lSjnmi. ct6otS6ii ©geoOTiJl 
aerr^rr ^eiiGleurri^ S0ij^Gl0(S\'*'*uuL_i_ ©g60OT(5i 

^6®fflLji_lLj i_i6rr6ffla(^a(g)LDn:6OT gniJii <^L^«sa60OTi_ 6iin:LLJLJun:(S\ Qy)6ULb 
a60CTa.^ll_LJU(S\.^ljD§J. 


girgiil = (xl_data -13)**2 + (x2_data -17)**2 
= (xl_data - 9)**2 + (x2_data -10)**2 


xl 

x2 

gniJU)! 

amru)2 



(15-13)**2 + (13-17)**2 
= 4 + 16 
= 20 

(15-9)**2 + (13-10)**2 
= 36 + 9 
= 45 

15 

13 

Sqrt(20) = 4.47 

Sqrt(45) = 6.70 



(19-13)**2 + (16-17)**2 
= 36 + 1 
= 37 

(19-9)**2 + (16-10)**2 
= 100 + 36 
= 136 

19 

16 

Sqrt(37) = 6.08 

Sqrt(136) = 11.66 

15 

17 

2 

9.21 

5 

6 

13.6 

5.65 

13 

17 

0 

8 

17 

14 

5 

8.94 

15 

15 

2.82 

7.81 

12 

13 

4.12 

4.24 

8 

7 

11.18 

3.16 

6 

6 

13.03 

5 

9 

4 

8.06 

0 

13 

12 

5 

4.47 


©geoOT© Oan:^^gU 60 )i_LU gntJii (gieojueiina 

I_i6fr6ffla6fr Oarr^^g^Lb, @^60)6uOLU«frl^ ©geDCTLcreiigj 
^60)LDa.^6OTJD6OT. @60)6ii (j^60)jdSlu LD^s^err LD^guii 2ME^n: (gljD^^^ 
SiD^aeoOTL an:L_i_uuL_(S\6rr6TT§j. ©eiieunjiina Glarr^gjaarreOT xl, x2 

LD^gUii ®[j6D(jTi_n:6ii§j Glarr^gjaarreOT xl, x2 CT6OTgu 4 ^Lbs^raaerr 

a6DCTa.^l_LJU(B'^6OTJD6OT. ^60)611 (J^60 )JdSlU I_16rr6ffl 6Ill^6ijlg^Lb, 6Ill^6iilg^Lb 

6ii60)gui_LDn:a eueoij^gj an:L_i_uu(B'^l6OTJD6OT. 

clusterl_xl = [15, 19, 15, 13, 17, 15, 12] 

clusterl_x2 = [13, 16, 17, 17, 14, 15, 13] 

cluster2_xl = [5, 8, 6, 9, 13] 

cluster2_x2 = [6, 7, 6, 10, 12] 



















plots(clusterl_xl,cluster I_x2,cluster2_xl,cluster2_x2) 
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©eijeiiuJDua ©ijeDOT© Oarr^gj-sagrr ai^^eurr-saijuLL OleOTeOTij, 

^6ii^jSld3l^5^§j L^ecOTiSlLb ©geoOT© ^6orfflui_iu I_i6rr6ffla6fr 
S0ij^Gt^(S\aauu(5l'^6OTJD6OT. ^6OTn:^ ©Lb(j^60)JD @60)6ii random-^a S^ijeii 
GtffLLJLULJu(S\ 6 ii^^ 60 ) 6 u. ©geoOT© Oaa^gjaagfflg^Lb ^eoLD^gjgfrerr xl, x2-aaa6OT 
mean aeocT-s-^LUUL© ^ 60 ) 6 iiSlu ^6®fflui_iu i^errefflaerraa ^ 60 )ld.^ 6 otjd 6 ot. ct6otS6ii 
© eOTguii <?;a)gu gj^^LULDueOT ©geocr© rBaii ai^eua-sa (j^i^llild. 


cl_xl = (15 + 19 + 15 + 13 + 17 + 15 + 12) / 7 
= 106/7 
= 15.14 






























cl_x2 = [13 + 16 + 17 + 17 + 14 + 15 + 13] / 7 
= 105/7 
= 15 


c2_xl = [5 + 8 + 6 + 9 + 13] / 5 
= 41/5 
= 8.2 


c2_x2 = [6 + 7 + 6 + 10+ 12] / 5 
= 41/5 
= 8.2 


Ljl6OT6OTij LS’eoOTQib ^6ijGl6ii[r([5 data-a^ii, ^6®fflLJi_iLJ i_i6rr6ffla(^4(g)LDa6OT 

«a60OTa.^i_LJu(B'^|D§J.'^^^ (g)60)JD6iia6OT ^6TT6ii OaaeoOTL ^geiiaerr 

©60)60ot.^16otjd6ot. ©eiieuajiiaa @[a(g) LS’eoOTQii @[J60ot(B 
Glaa^gjaerr @ 60 ) 6 ii ^geiiaeroerr ©eOTguii gj^dJlLumaaiJ 

LjliJluueo^a aaeocreuaii). 
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©eijeiirrJDrra^ ^geiiaerr ^6OT.S(g)iflLU s^ffleugu Gluni^^gjii eueogiijl^Lb, 

@^60)6otSlu ^[rii O^rrLijff^lLurraff Os^Lugj OarreoOTSL Oa^^eueumi. ©gjSeu 
clustering with k-means &t6otlju(S\.^jd§j. k ct6otu§j ct^^ 60 ) 6 ot Oarr^gjagrr/ 
(g)(i^.sa6rr ai^eun-saLJUL S 6 ii 60 OT(S\Lb CT 6 OTU 60 )^iL(Lb, means ct6otu§j ^eiiOeun:!^ 
features-6ii60)i_LU ffgnffffleoLuiLiLb a6DOT(5iQli9.^§j ^^6 oti^lju60)i_lijI^ (g)(i^aa60)6n: 
2([56iin:a(g)6ii60)^LL(Lb (g)jSlLJi!jl(S\.^JD§j. k-6OT ld^lji!j160)6ot CTeiieungu 

a6DCTa.^(5i6ii§j CT6(jrgu unij-saeuaLD. 























Elbow Method 


Oarr^-saLJULL ^g6iia(^.S(g) ct^^60)6ot (g)(i^aa60)6n: ai^eurra-^eOTn:^ ffijliun:® 
@^5a(g)ii) CT6OTU60)^ ^^5 eueoguLii) Qy)6ULb aeDOTLjSluj a^eii-^JDgj. SiD^aeDOTL 
^g6iia60)6TT ©[a(g)Lb CBnii ULueOTU©^^* Glanerrgrreumi). 2 (g)(i^aa 6 rr ct6otu60)^ 
[BLDa(g) ffffliLinaa an:L_(S\.^JD^n: ct6otlj un:ij.sa6un:Lb. (glg^ LD^guii 

eijlgrraaii) flleOTeui^LDrrgu- 


https://gist.github.coni/nithyadurai87/10b5b273151c80be97579d684279cd84 


from sklearn.cluster import KMeans 

from sklearn import metrics 

from scipy.spatial.distance import odist 

import numpy as np 

import matplotlib.pyplot as pit 
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X = np.array(list{zip(xl, x2))) 



distortions = [] 


K = range (1,8) 

for i in K: 

model = KMeans{n_clusters=i) 
model.fit(X) 

distortions.append(sum(np.min(odist(X, model.cluster_centers_, 
'euclidean'), axis=l)) / X.shape[0]) 


pit.plot{) 

pit.plot (K, distortions, 'bx-') 
pit.show{) 

xl, x2 CTguLb ©[J 60 ot(B numpy Qy)6ULb x ct^uld ^eorffliurra 

LD[r^JDUu(5\'^l|D§J. i!jl6OT6OTij ©0^[j6iia6O)6TTa OahecOT© kmeans-<S(g)LJ uujI^^I 
^6fflaauu(S\'^JD§J. @LJUu!jl^^lLU[r6OT§j 1 7 6ii60)g CTeocreorfflaeoaii!]!^ 

(g)(l^aa60)6TT ^60)LD^gJ ULi!j1^#1 ^6ffla.^JD§J. ^6ijGl6Iin:(^ (J^6roJDLL(Lb 
^g6iia(^a(g)m, ^eorffleiiu i_i6rr6ffla(g)LDn:6OT eiileua^ CTeiieuerreii gntJii @(^«s.^JD§j 
CT6OTU6O)0a aeocra.^Q'^IDgJ. ©eiieuajiiaa CT60OT6®ffla60)aujl^ (g)(i^aa60)6TT 
^6roLDa(g)Lb Suhgj ^6ii^j51g^6rr6TT eiileua^ (gieoJD-^JDgj ct6otu§j 

a60OT(S\fliy'S<SLJu(B'^|D§J. eiileua^ ld^uSu cost / distortion CT6OTgu 
^6ro^aaLJu(5i'^tD§J. 

Ljl6OT6OTij ©60)611 6II60)gUI_LDn:a 6II60)rrLULJU(5i'^60rJD60r. ©06Or X 

(g)(i^aa 6 fii 6 or 6T60or6orfrla60)aLL(Lb, y ^^ 6 or 6 i]l 6 ua^ LD^LJi_ia(^Lb 

^60)LD.^6orJD6or. 6T6orS6ii^n:6or ^60)6or^§j^ ^g6iia60)6TTLL(Lb 



^60)LD.S(g)LbSun:§j ^^gueoLLU centroid-d3l(_[5^§j ld^jd ^geiiaeffleOT eiileua^ ld^lji_i 
5-<S(g) Sld^ an:L_(5\6ii60)^LLiLb, ^gjSeu 7 Glarr^gjaaermau 

ijliJl<S(g)LbSun:§j, ^^guewLLU eiileua^ ld^ui_i l-4(g)a an:L_(S\6ii60)^LL[Lb 
arreocreumi. eueoguLLD un:ijuu^^(g) ^^5 eui^eiil^ ©(^uu^rr^, 

Elbow method CT6OTgu ^60)^4aLJU(S\.^ljD§j. 6ii60)gui_^^6OT 2 ct 6 otjd 

i_16rr6fflLijl^ SurreOTJD eui^euLb LDi_[a^ 

CT60(jT60rfrl460)aLLjl^ ^g6Iia60)6TTLJ SurrgJLD CT6OTU60)^ [Bmi 

Glarrerrerreumi. 6jGl6OT«fil^ Sld^ Glff^euff Glff^eu eiileua^ LD^LJi_ia6rr 

^g^rreiiaSa (g)60)JD^6OTJD6OT. i_i6rr6frlLijl^ ^rreOT (j^^raeoa ldlkj^ld (gleoeu 
ej^ulSl-^lOgJ. ct6otS6ii ^[j6i[3D&j)6n 2 (g)(i^aa6frl^ ffijliurra 

CT6OTU§J a60OT(S\flll9-'*'*UU(S\.^ljD§J.. 




silhouette coefficient 


^([5 algorithm-6(jr G1 s=lu^^jd6ot ct6otu§j CTeiieuerreii gniJii <?fflLun:a.s 
aeo^M^gjerrerrgj ct 6 otu 60 )^u ^ 60 )ld.^jd§j. ©gjeueog ^aii aeoOTL 

^60)6OT^^g^Lb, algorithm- 6 OT a6Drfflui_ia60)6n: 26DOT60)LDLua6OT LD^LJi_ia(^i_ 6 OT ^ui!j 1 l_(B 
GlffLU^^JD60)6OTa aeDOTLjSl^S^aLD. ^ 6 OTa^ k-means SuaeOTJD unsupervised 
learning-^ ^LJi!jl( 5 i 6 ii^^(g) ^LbiSlLU) ^geiiaerr ejgjib aageocr^^a^, 

@^60)6OTa a^eiiii ^^5 6 ii^(j^60)jdSlu silhouette_coefficient 


^^aeugj k-means (j^60)jdlijI^ 6 ii 60 )aLJU(S\^^uu( 5 iii ^ijeiiaerr, ffffliLineOT 
(j^60)JDiijl^^n:6(jr 6ii60)aLJu(5i^^uuL_(5i6n6n:^n: &t 6 ot.s aeoOTLjSliu ej^aeOTSeu distortion 
CT6OTJD ^ 6 <jr 60 )JD ^6n:6ijlL_Si_n:Lb. ^euGleuni^^ ^geiiii ^eorfrleiiu 

i_16rr6friLijldjl^5^§j CTeiieueneii gtiiJii 6 ijl 6 U.^Lijl([ 54 .^ 1 jD§j &t6otu 60)^ eoeu^gj, kmeans- 6 (jr 
GtffLu^^JD60)6OT.s a 6 DOTa.^( 5 i'^tD§J. ^gjSurreuSeu silhouette_coefficient 
CT 6 OTU§j i!jl6OT6ii([5Lb eumLiuun:© Qy) 6 ULb ^geiiaefr ^eoLD^gjerrerr ^euGleuni^^ (g)(i^ 6 iiLb 
CTeiieugneii aff^l^LDcrau tjlfflaauuLQerrerrgj CT6OTU60)^a a6D0T.s.^(S\.^JD§j. 


ba / max(a,b) 


a CT6OTU§j (g)(i^6iil^ aerr^rr ^geiiai^a-^lewLSiLiLurreOT gtiiJii. b 

CT 6 OTU§j ^([5 (g)(i^ 6 ijl^(g)Lb (g)(i^ 6 iil^(g)Lb ©«5 )i_Slu 

^g 6 iia(^a.^l 60 )i_SLULun: 6 OT gtiiJii. 


-^L^aaeoOTL CT(S\^§jaan:L_i^^ [bld§j ^geiiaerr, kmeans Qy)6ULb 2 

(g)(i^.sa6TTn:au i!jlfflaa5LJU(S\.£l6OTJD6OT. ^eiieunSjD for loop Qy)6ULb 3,4,5 

LD^gUii 8 (g)(i^.sa6TTn:aLJ i!jlfflaaLJu(5i'^6(jrJD6OT. loop-<S(g)6rr (g)(i^aa56rr 
Gtftn:(5i'S'SUUL_i_ &T60OT6®frl.s60)aLijl^ ^eiiGleurii^ (j^eoJD ^eoLDLLiLbSungjii, 


^[j6iia60)6TTLJ LjliJl<a(g)Lb 6ii60)gui_LDn:a 6ii60)g^§j ariL^'^JDgJ LO^lguii 

silhouette_coefficient LD^uewu Gl6ii6ffluu(B^§J'^JD§J. 

https://gist.github.eom/nithyadurai87/f5f043df412b6e3c8291d0080422bd92 

import numpy as np 

from sklearn.cluster import KMeans 
from sklearn import metrics 
import matplotlib.pyplot as pit 
pit.subplot(3, 2, 1) 
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pit.scatter(xl, x2) 


X = np.array(list{zip(xl, x2))) 


g' ^ 

'r' , 

'C , 

'm' , 


'k' , 

'b 

S ' , 

'D' , 

'V , 

1 A 1 

f 

'P' . 

1 -k 1 

t 

' + 


m = ['o', 



p = 1 

for i in [2, 3, 4, 5, 8]: 

p += 1 

pit.subplot (3, 2, p) 

model = KMeans{n_clusters=i) .fit (X) 
print (model.labels_) 

for i, j in enumerate(model.labels_): 

pit.plot (xl[i], x2[i], color=c[j], marker=m[j],ls='None') 

print (metrics.silhouette_score(X, model.labels_ 

,metric='euclidean' ) ) 

pit.show() 


print (model.labels_) ct 6 otu§j @(1^60)611 0 @[j60OTi_n:6ii§j (g)(i^60)6ii 1 

CT6OTS6II xl x2-^ ^611611 12 ^17611^(^11) 61^0^^^ 

(g)(i^aa6fii^ S<?ij4aLJUL_(S\6rr6TT6OT 6T6OTU§jii) Coefficient LD^ui_iLb 

Ljl6OT6II(75LDn:gU Gl6II6fiiuu(S\'^itD§J. 


[1 1101111000 1 ] 
0.6366488776743281 


^6ij6iin:SjD 3 (g)(i^aa6rrn:aLJ i!jlffl4(g)LbSun:§j 0 (g)(i^60)6iiLLiLb, 1 ©rr60OTi_n:6ii§j 

(g)(l^60)6IILLlLb, 2 Qy)6OTJDn:6II§J (g)(l^60)6IILLlli) l!jl6OT6II(75LDn:gU (g)j5lLJl!jl(5i<^tD§J. 



[0 0010002111 2 ] 
0.38024538066050284 


@§jSun:6OTSjD 4,5 8 ^erreiil^ (g)(i^aan:6TTn:aLJ i!jlffl«a(g)LbSun:§j ^geiiaerr 

Sffij^gjerrerr (g)(i^aa6ffl6OT LD^ui_iLb, ^a(g)(i^6iil^an:6OT GlffLU^^JD6OT LD^LJi_iLb 
Ljl6OT6ii([5LDn:gu Gl6ii6ffluu(5\'^l6OTJD6OT. ©60)^ 60)6ii^§jLJ un:ija(g)LbSun:§j 2 (g)(i^aa6rrn:aLJ 
LjliJlacgiii) Surrgj ldlQSld, ^erreii ^jD60)6ot (0.63) 

Gl 6 ii 6 fflLJu(B^§J 6 ii 60 )^a arreocreurni). 


[2 0010002113 2] 
0.32248773306926665 


[2 4010402113 2] 
0.38043265897525885 


[6 7343162045 2] 
0.27672998081717154 


6ii60)guui_^^^ (j^^6un:6ii^n:a aerrerrgj Oeuguii ^[j6iia(^aan:6OT ULii. 

@g60OTi_n:6ii^n:a serrerrgj 2 (g)(i^aan:6rrn:aLJ i!jlffla(g)LbSun:§j Gl6ii6fflLJu(5\ii> eueoguLib. 

s6rr6TT§j 3,4,5,8 CTeocreorfflaeoau!]!^ (g)(i^aa6ro6rr ^6roLDa(g)LbSun:§j 
Gl6ii6fflLJu(B'^6OTJD 6ii60)gui_[aa6rr. 8 (g)(i^aa6rr eueog ^geiiaerr 

Ljli5]aauu(5\'^6OTJD6OT. CT6OTS6II ^eiiOeiin:!^ (g)(i^6i]lg^Lb aerrerr ^geiiaeoerr 
6iil^^Lun:ffLJu(5\^^'S an:L_i_, 8 (glJD 6ii60OT60OT[aa(^Lb 8 GleueLSeugu 6iii^6ii[aa(^Lb 
GlarreoOTL ©g60OT(B uli^lu^ S([56iin:aauu(5\'^lJD§J. ^ 60 ) 6 ii ^eiiGleurreOTJnrra loop- 
a(g)6rr Glff6OTgu illleOTeiii^LDrrgu Gl6ii6fflLJu(B'^6OTJD6OT. 
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Support Vector Machine (SVM) 


Support Vector Machine (SVM) CT6(jru§j ^rreiiaeogrr 6 ii 60 )auu(S\^^u 
ijli)iLJu^^an:6OT ^(_[5 6 ii^(j^ 60 )jd logistic regression 

CT6OTU60)^u u^jSIu ufirT^S^mi. SVM &T6(jru§j 6ii60)auu(5l^§J^^ 

CTgULD S6II60)6U60)LU logistic-g 6ijll_ ©eOTgULD gJ^dJlLULDfia ^60)LD.S.^JD§J. 

S[B*rjSan:(5l Qy)6ULb Olffl-saijulBii ^[J6iiff>(^a(g) large margin classifier CTeiieungu 
a^eii-^JDgj &T 6 OTU 60 )^LL(Lb, S[B*[jSff>n:(S\ (j^ 60 )jdli!j 1 ^ fllffl-sauuL (j^i^Lun:^ ^g6iia(^a(g) 
kernels CTeiieungu a^eii-^ljDgj &T 6 OTU 60 )^LL(Lb ©LJU(g)^Lijl^ aneocTeunib. 



Large margin classifier (linear) 


^^5 ScBrjSan:© Qy)6ULb eueoauu©^^ (j^i^llild ^geiiaeoerr 
logistic CT6ij6iin:gu svm CTeiieungu CT 6 (jru 60 )^a 

an:L_i^iL(6frS6TTn:Lb. xl, x2 CTguLb ©geocr© aerrerreOT. ^ 60 ) 6 ii 2 

ufflLDneocrraaerr (2 dimension matrix) GlaaeoOTL ^Sg ^eorffliLiaa numpy Qy)6ULb 
LDa^JDUu(S\.^l6OTJD6OT. i!jl6(jr6OT*g ^^^geiiaeoerra Glaaeoijr© logistic-<S(g)Lb, svm-.S(g)Lb 
uiul^^l ^effla-^lSjnaii). i!jl6OT6OT*g ^eiiOeiiaeOTguii ^geiiaeoerru fllffluu^^aaeOT 
S[B*gS<saL_iy 60 ) 6 OT ffffliLiaa CT[a(g) ^ 60 )ld.s.^ 6 otjd 6 ot CT 6 OTU 60 )^a aaeoOTU^^aaeOT (glg^ 
classifier()-a(g)6rr ^(i^^uuLiBerrgrrgj. 


https://gist.github.coni/nithyadurai87/2de5a6a6f7cc03c2791305f5c33d43d7 


import numpy as np 

import matplotlib.pyplot as pit 

from sklearn import svm 

from sklearn.linear_model.logistic import LogisticRegression 


def classifier{): 


XX = np.linspace {1,10) 

yy = -regressor.coef_[0][0] / regressor.coef_[0][1] *xx- 
regressor.intercept_[0] / regressor.coef_[0][1] 


pit.plot(xx, yy) 


pit.scatter(xl,x2) 



pit.show{) 


xl = [2,6,3,9,4,10] 

x2 = [3,9,3,10,2,13] 

X = np.array{[[2,3], [6,9], [3,3], [9,10], [4,2], [10,13]]) 
y = [0,1,0,1,0,1] 

regressor = LogisticRegression{) 
regressor.fit(X, y) 
classifier{) 

regressor = svm.SVC{kernel='linear',C = 1.0) 
regressor.fit(X,y) 
classifier{) 

logistic Qy)6ULb ^ijeiiaerr i!jlfflaaLJu(5iiiSun:§j ScB’rrSan:© 

^ 60 )LD.^JD§j. ^^rreiigj aerrerr 6 ii 60 )aa(g) LBlaeiiii) GlcBi^aaLorra CTeiieiil^ 
@60)i_Gl6ii6f[iiL[Lb ©^6un:LD^ S[B*rrSan:(5i ^60)LDaauuL_(5i6rr6TT§j. ^6OTn:^ SmSeu serrerr 
6ii60)aa(g)Lb San:L_i^^(g)LDn:6OT ©eoLGleuefrlSLun: iBlaeiiii) aerrengj. 



12 



SVM Qy)6ULb ^[jeiiaerr LjliJl<aaLJu(BiiSun:§j ©geoOT© 6ii60)aa(g)Lb [b(B 6S1^ aerrerr 
San© ^eiieiSlgeocT© eueoaiiljldjli^^gjLb aiDLoneOT ^^neii aenerrgj. ct6otS6ii 

^n 6 OT equal margin / large margin classifier CT 6 OTnu ^eo^aauuQ^JDgj. 
logistic regression-aan 6 OT ^([5 optimization-^aSeu ai^^uuQ-^ljDgj. 



12 - 


10 - 




Kernels (non-linear) 


Kernel ct 6 otu§j ScB’rjSan:© Suni.© (j^i^Lun:^ 'S'lfflflU <si^ 6 OTLDn: 6 OT non-linear 

(j^ 60 )JDiijl^ ^eoLD^gjgrrgn ^geiiaeoerr 6 ii 60 )S 5 uu( 5 l^§J 6 ii^^(g)U ulu 6 otu(S\.^jd§j. 
SuneOTJD S^ijSanLi^^ Oun:(_[ 5 ^^n:^ ^geiiaeoenu Glun:gu^§J 6 ii^^(g) ej^OaeOTSeu 
polynomial regression ct 6 otjd ^6OT60)jdu unij^S^mi. ^enOeuni^ 

features- 6 ii 60 )i_LU higher order LD^LJi_ia 6 rr aeocr-s-^LiJUL© ^ 60 ) 6 ii i_i^^n:a 
@ 60 ) 60 CT^§j 6 rr 6 TT OanerrerruuiSlLb. ct 6 otS 6 ii ^geiiaeh 

(j^(i^^n:aLJ Ouni^^gjii eueogujlg^Lb square, cube CT 6 (jrgu order-^ 

features-g<s aeocra-^L© ©60)6D0T^§ja OaneoijrSL Gls^^Seumi. ©eiieungu 
OffLLJLLiiiSungj uiijl^^l ^6frl.sauu(S\ii) ^geiil^ ^erreii ^Lbs^aaerr 

Sffijaauu(S\6ii^n:^, ^^5 algorithm Oangfreu^^aneOT S[B[j(j^Lb aeorfflsfrl 

^60)6OT^60)^LL(Lb ^gneijl^ (gl60)6OT6ijl^ 60)6II^§J.S Glan:6rr6TT S6II6DOTl^LU 

S^60)6iiLL(Lb ©60)^^ ^eiilijLJU^^arra kernels / similarity 

functions 


i-l^§J ^Lbff[aa60)6TT ©eoeocraarriD^, ej^daeOTSeii aerr^rr ^Lbff[5ja6ffl^ 

@(i5^§J l-l^LU ^Lb<?[aa60)6rT4 a60OTa.^lL_(5lU ULU6OTU(S\^§J'^ltD§J • S^ag60OT^§ja(g) 
CBLDgj uLul^.#!^ ^gefil^ 5 100 Lorr^ijl^ ^geqai^Lb CT6OTgu 

60)6ii^§j4Glan:6rrS6iin:Lb. Polynomial CTgurb Surrgj 5 features-<*(g)Lb 

square ld^^ld cube LD^uiqaerr a60OT(5lLiliy'*'*uuL_(5l, «s 60 )i_^Ilij 1 ^ ^ 60 ) 6 ii 20 -a^ii 
SmeurreOT features-^a eu^gj (gl^(g)Lb. ^gjSeu kernel Qy)6ULb Glurrgu^gJii Surrgj 
^eiiGleurri^ ^Lbff[5ja6fflg^Lb aerren 100 Lorr^ijlaefil^ @(i5^§J @>(15 ^[j6iil60)6OT S^ireq 
GlffLugj ^^ 60 ) 6 OT landmark-^a ^ewLDa-^ljDgj. Ljl6OT6OTij ld^jd ^geqaerr 

CT6ij6II6rT6Il ^60)LD^§J6rr6rT6OT CT6OTU§J a60CTa.^ll_LJU(S\'^ltD§J. ^60)611 

landmark-4(g) 1 6im6i[w, ©^eoeuGlujerffl^ 0 CT6OT6iiLb 

6 ii 6 O)aLJu(S\^ 0 uu( 5 l'^l 6 OTJD 6 OT. © 60 )^ 60 ) 6 ii^S^ iq^LU feature aeocra^LUuQ'^tDgJ. 
^^rreiigj uljI^^I^ ^gefil^ aerren 5 ^Lbff[aa(^4(g) GleugULD 5 iq^LU features 

LDL_( 51 <SlD ©LbqpeWJDLLjl^ a60OTa^l_LJU(S\'^6OTJD6OT. 


similarity function-aaneOT ffioeOTun:© flleOTeui^LDcrgu- ©gjSeu kernel CT 6 OTguii 
^ 60 )^ 4 aLJu( 5 l'^tD§J. kernel @.sa60CT.s.^(5l'*60)6n: (glff>L^^§j 6 ii^^(g) u^Seugu 


6 iin:LLJLJun:(B'S 60 ) 6 TTU Ou^jSl^ 5 a(g)Lb. ^6OTJDn:6OT exp()-4an:6OT ffioeOTun:© 

Glan:(B'*'*uuL_(5\6rr6rT§j. ©gjSeu gaussian kernel CT6(jr^ 

SurreOTgu polynomial kernel, string kernel, chi-squared kernel, histogram- 
intersection kernel CT6OTgu 6 ii 60 )aLun: 6 OT eiimLiLJurriSi'Serr kernel-^ aerreneOT. 


fl = similarity (x ,11) 

= exp (-(||x-l||**2 / 2*sigma squared )) 


SVM without kernels ct6otu§j logistic regression-g4 (gij^la-^ljugj. ^^rreiigj 
kernels Qy)6ULb ai^eurraauuLL iq^LU features-gu ULueOTUiJi^^rrLD^, ScBijiyLurra raw 
feature-g<s GlarreoOT© ldlQSld 6ii60)auu(S\^§J^^ logistic 

regression-gSiLi ct6otS6ii CTuSurrgj kernel-gu ULueOTUiSi^^eumi 

CTuSurrgj logistic-gu ULueOTUiSi^^eumi CT6(jrgu urrijuSumi. S^ir^Gl^iJiaaLJULL 

CT60OT6®M460)a(100000 or 100), ULLjl^^la(g) ^^fflaaijuLL Lorr^ijl^ 
CT60OT6®frla60)a60)Lu(10000) eSit. LBlaeiiii 

^^6u§j LBlaeiiLD (g) 60 )JD 6 iiaa ©(^^^aSeua svm without kernel-gu ULU6OTU(S\^^6uaLb. 
^gjSeu features-6(jr CT 60 OT 6 ®frla 60 )a( 1000 ) LBlaeiiii ©^euam^ ^geaeqa^ 

ff^au ^^amaa ©(^a^rbSua^j svm with kernel-gu ULU6OTU(S\^^6uaLb. 


-^L^aaeoOTL &T(S\^§jaaaL_i^^ u^Seuau ^LDS^raaeoga eoeu^gj LDeuij m^dilLua, 

Sga^aeua, ^aiDeogLua CTearay 6 ii 60 )aLJU(S\^^LJU(S\.£la)§j. © 60 ) 6 ii svm without kernel 
^^aeugj logistic Qy)6ULb 6 ii 60 )aLJU(S\^^LJu( 5 l 6 ii 60 )^ 6 ijli_ kernel Qy)6ULb 
6ii60)a5LJU(S\^^uu(5liiSua§j accuracy aaeocTeuaib. 


https://gist.github.com/nithyadurai87/9d7cc99cc4ael8a3707cc76f8711193b 



import numpy as np 


import matplotlib.pyplot as pit 
from sklearn import svm 
import pandas as pd 

from sklearn.metrics import accuracy_score 

from sklearn.model_selection import train_test_split 

from sklearn.svm import SVC 

from sklearn.metrics import classification_report, confusion_matrix 
from sklearn.linear_model.logistic import LogisticRegression 

from matplotlib.colors import ListedColormap 

df = pd.read_csv{'./flowers.CSV') 

X = df[list(df.columns)[:-!]] 
y = df['Flower'] 

X_train, X_test, y_train, y_test = train_test_split(X, y, 
random_state = 0) 

logistic = LogisticRegression{) 


logistic.fit(X train, y train) 



y_pred = logistic.predict{X_test) 

print {'Accuracy-logistic:', accuracy_score(y 

gaussian = SVC(kernel='rbf') 
gaussian.fit(X_train, y_train) 
y_pred = gaussian.predict(X_test) 
print ('Accuracy-svm:', accuracy score(y test, 


Accuracy-logistic: 0.868421052631579 
Accuracy-svm: 0.9736842105263158 


test, y_pred) ) 


y_pred) ) 



PCA - Principle Component Analysis 


Principle Component Analysis ct6otu§j ^erreii uiJlLDneocraagrr OarreoOTL 

^g6iia60)6TT (g)60)JD^^ ^6TT6ii uiJlLDneDOTaagn LDn:^gu6ii^^(g)LJ 

uiiJ6OTu(5l'^liD§J. 1000 Glaneoijr© ^^5 eiSle^^Luii 

a 60 ?lTl.sauu(S\.®JD§j ct6ot eoeu^gj-s GlanerrSeumi. PCA-^6 ot§j 1000 X-g 
100 X-^aSeun: ©eAguii (g) 60 )JD^^ uiJlLDneocraagrr OaneDOTL^naSeiin: 

LDcr^jSla Glan:(S\a(g)Lb. ^^rreiigj Y CTeocreorfrlaeoaeoLULi u^jSI-s aeueoeuuuLngj. 
Gieuguii X CT 60 OT 6 ®frla 60 )a 60 )LU LDL_(5lii (g) 60 )JD.S(g)Lb. CTeOTSeu^neA PCA&reAugj 
dimensionality reduction-.S(g) a^eii-^eAjD ^^5 ^lJDLJi_i6ii60)a 6 ii^(j^ 60 )jd 
©^eA GlffLuAunlB'Sefrl^ aerrerr ui^aerr QleAeui^^LDngu- 

• Glu^^a Glarreni^^^ (xl,yl),(x 2 ,y 2 ), 
(x 3 ,y 3 )... 

• PCA Qy) 6 ULb ^geiilA agrrerr x ^60)6OT^60)^LLiLb [BLD.S(g)^ 

S^60)6iiLun:6OT ^grreii (g)60)JD^^ greoAeoAlagOTaLi!]!^ 

• uleAgOTij (g)60)JDaauuLi_ i_i^lu x -ga GlaaeoA© uiAl^#! 


Gluagjeiiaa PCA^60)6 ot^§j ULueAuLagj. ^iJl^aaSgu 

uLU6Au(5lii. gT(5l^§J'AaaL(S\'A(g) rngAl^ (j^arAaerr ^^eugj aMEij^aerr 
SuagAjDeu^gOTJD ^60)i_Lua6TTUu(S\^§Jii algorithm- 4 (g) ^gfrlaauu( 5 lii 

^ggiiagfflA (g)60)JD^^uLffii 1 euLffii features-^eiigj gjGlgOTgrfil^ ^^5 

amrij^LulgA a^aagii, ewauflli^., uaaa agoAgoorai^aerr, (j^eA giSlgrra^aerr 

grgAau ^eiiGleuai^ -^IgAsaA ^IgAem: gfle^^LurAagoemLiLb ^ewLLuagrruu©^^ 

^grreiil^ features ©gjSuaeAjD ©i_[Aagfrl^ , ^ 60 ) 6 ii 

^ 60 ) 6 OT^ 60 )^LLiLb ULU6Au(S\^^m_DA (g) 60 )JD^^ ^gneiil^ features-g LDa^au6ii^^@ 
PCAulu6Au(S\.^Ijd§j. CTuSungjii pca-gu uLU6Au(5l^§J6ii^^(g) (j^6Ai_i feature scaling 
grgAjD ^gAau aeoAi^uuaa [B 60 )i_G 1 ujd SeueoACAii. ©gjSeii data-preprocessing greAgu 
^go)^4auu(S\Lb. 



CBmi OarrerrerT.? aeuuLDrra S6ii60OT(S\ii 

CT6OTU^^an:a 4 dimension GlarreoOTL ^geiiaerr 2 dimension-^a PCA Qy)6ULb 
LD[r^JDUUL_(5i6rr6rT§j. PCA ULU6(jru(S\^§J6ii^^(g) (j^6OT6OTij StandardScalar Qy)6ULb 
^geiiaerr normalize GlffLLJLULJu(5l'^6(jrJD6OT. LjleAeOTij ^([5 LD6uij m^dilLua, Sga^aeua, 
^aLDeogiLia CTeAgu ^gmaslrlaa ^eiieijl^L^ai^ewLLU [§6 tt ^a6U(j^Lb, ^eu^jSleA 
(Sld^i_1 JD @^L^a(^60)i_Lu ^a6U(^LDaa 4 ©eweii PCAQy)6ULb 

xl, x2 CTguii) ©geoOT© LDa^JDUu(5l'^6OTJD6OT. ©eiieiilgeoOT© 

^i^lju60)i_lij 1^ ^eoiDiLiLb 3 eueoa LDeuijai^LD 3 (glJDKjaeffl^ 

6II60)gUI_LDaa 6II60)[J^§J aaL_i_uuL_(S\6rr6TT§j. 


https://gist.github.eom/nithyadurai87/20dl8bbda53e43del9222e24d330a398 


import numpy as np 

import matplotlib.pyplot as pit 

import pandas as pd 

from sklearn.model_selection import train_test_split 
from sklearn.preprocessing import StandardScaler 
from sklearn.decomposition import PCA 

df = pd.read_csv{'./flowers.CSV') 

X = df[list(df.columns)[:-!]] 


y = df['Flower'] 



X_train, X_test, y_train, y_test = train_test_split(X, y, 
random_state = 0) 

pea = PCA{n_components=2) 

X = StandardScaler{).fit_transform{X_train) 

new_x = pd.DataFrame(data = pea.fit_transform(x), eolumns = ['xl', 
'x2' ] ) 

df2 = pd.eoneat{[new_x, df[['Flower']]], axis = 1) 

fig = pit.figure {figsize = (8,8)) 

ax = fig.add_subplot{1,1,1) 

ax.set_xlabel{'xl', fontsize = 15) 

ax.set_ylabel{'x2', fontsize = 15) 

ax.set_title{'2 Components', fontsize = 20) 

for i, j in zip{['Rose', 'Jasmin', 'Lotus'],['g', 'b', 'r']): 

ax.soatter(df2.loe[df2['Flower '] == i, 'xl'], 
df2.loo[df2['Flower '] == i, ' x2'], o = j) 

ax.legend{['Rose', 'Jasmin', 'Lotus']) 


ax.grid{) 



pit.show{) 


print (pea.explained_variance_ratio_) 


print (df.columns) 


print (df2.columns) 


[0.72207932 0.24134489] 


Index(['Sepal_length', 'Sepal_width', 'Petal_length', 'Petal_width', 'Flower'], 
dtype='object') 


Index(['xl', 'x2', 'Flower'], dtype='object') 



3 


2 Components 





• Rose 

• Jasmin 

• Lotus 


—I-1-i- 1 -1-r 

-2-10123 

xl 


@^gU 60 )i_LU OeuefflulLi^^ &t6otu§j explained variance ct6otu§j [0.72207932 
0.24134489] eu^gjerrengj. ©eiieiilijeoOT© LD^LJi_ia60)6mL(Lb an_L_i^6OTn:^ 0.96 
CT6OTgu eui^ii. CT6OT6OT &T6OTJDn:^ ©eiieiilgeocT© components-ii) 

96% ^aeu^aeogrr 26rr6n:i_a.^LL(6rr6n:§j ct6otfu 6jGl6OT«frl^ 

features-g<s (g)60)JDa(g)LbSua§j ®Lpui_i 6j^ui_ 6iin:LLJLJi_i aerrengj. ct6otS6ii 




















variance CT6OTUgj CTeiieuerreii ^eiiGleuaeOTjSlg^LD 

SffL6laaLJUL_(5i6rr6TT6OT CT6OTU60)^a S^JD ©60)^LJU^jSlLL(Lb, PC A 

GlffLu^u(S\ii 6 i] 1 ^^ 60 )^LL(Lb ©eOTguii eiSlenaaLDnaa aaeocreumi). 



Data Projection 


^[J6iift6ffl6(jr ufflLDn: 60 CT[aa 60 ) 6 TT (g) 60 )JDUu^^(g) 2 ^ 61111 ) ^llSld Projection line 
projection area CT6OTLJu(5l'^tD§J. <^l^<s<*6doti_ 6 ii 60 )gui_[aa 60 ) 6 TT 
aeiisfrlaaeiiLb. ©i_§ji_iJDLb agrrerr 2 dimension OangoOTL ^geiiaerr 1 

dimension LDn:^JDLJu(5l6ii^^an:6OT ^LLii agrrgrrgj. xl, x2 grguLb 2 
^Lbs=[aa(^aan:6OT scatter plot agrrgrrgj. ^6 ii^jS16ot [B(5\6iJl^ ^eoLD^gjgrrgrr 
Sftn:(S\^n:6OT projection-aaneOT ScBna-^SiLi ^geiiaerr 

^60)6OT^§jLb GlffgOTgu uiJlLDagocTii GlangoOTL^rra LDn:^JDLJu(5l'^6OTJD6OT. ^giieiinSjD 
6ii6uui_iJDLb aerrerr xl, x2, x3 giguii 3 ^Lbs^Kjai^-aarreOT ^geiiagn aerrengj. 

^6ii^jSl^an:6OT projection area-^6OT§j 2 ufflLDagocrraagogrra GlaneoOT© 
6 ii 60 )gui_^^^ an: 60 CTUu( 5 l 6 ii§j SuneOTgu ^ 60 )i_Lun: 6 TTUu(S\^^LJu( 5 l'^tD§J. a^iJSlLLierrgrr 
^geiiagn ^gogOT^gjii ^uuguuerrgii GlaneoOTL U(g)^.S(g)6rr OffgOTgu 2 ufflLDcrgDOTraaerr 
OangOOTL 06ii.si_gn:a LDn:^JDUU(S\'^6OTJD6OT. 
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Projection Error 


SiD^aeDOTL ©geoOT© ULraaefflg^ii ^geiiaerr ^eoeii ^eoLD^gjerrerr 
project Oa^LULUuuLL ©i_^^^(g)LDn: 6 OT ©eoLGleuefflSiLi projection error CT 6 OTgu 
^60)^aaLJu(5t'^ltD§J. 2d-g Id-^a LDn:^gu6ii^^an:6OT UL^eo^u un:rra(g)LbSun:§j 
2[aa(^.S(g) linear regression (£l60)6OT6ii.S(g) eugeumi. PCAct 6 otu§j linear 

regression ^^ 6 U. 6 j 06 OT«frl^ agrren prediction-a(g)U 

uiiJ 6 OTUi_n:§j. OguguLD projection-.S(g) ldlQSld ULUgOTU^-^tDgJ. ^giigurrSjD 
^aSacTLi^gogOT gogu^gj Y LD^LJi_iago)gTT agorfrl^gjff GlOT^gungj. Oguguii x 
LD^Lji_iago)gTT ©LLDrr^JDLD Os^Lugu^^Sa ©aSafi© ULUgOTU(S\.^JD§j. SiDg^ii linear 
regression-^ sum of squares error grgOTUgj ©goLUULL Glff[a(g)^^n:aa 

agocTa.^( 5 i'^|D§J. ^gOTU^ PCA-^ projection error grgAugj u-saguriLi^^ 
agOCTa.^l_LJU(5i'^|D§J. fllgOTgUi^LDUgU- 






C^ro^?cA^O^V 











Compressed components 


^6TT6ii OarreoOTL uiJlLDneocrraagrr CTeLeiingu ^IjSIlu ^erreiil^ ai^aauu^-^IDgJ, 
26rr6TT ui^aerr ct 6 otG 16 OT 6 OT 6 ot CT 6 (jrgu OleOTeui^LDcrgu unijaaeumi. 


1. ^geiiaerr ^eoeOT^gjii feature scaling GlffLULUUUL (S6ii60OT(S\ii. ©gjSeu 

data preprocessing CT6(jr^ ^60)^aaLJu(S\'^ltD§J.(xl, x2, x3...xn) 


2. features-<s.^ 60 )i_SLULun: 6 OT ^geiiaerr CTeiieungu ^eoLD^gjgngneOT 

CT6OTU60)^.s arreocT covariance matrix ai^eunaauuiB'^tDgJ. @^^«sn: 6 OT euniLiLJun:© 
ijl 6 OT 6 ii([ 5 LDn:gu- @§jS 6 ii sigma CT 6 OTgu 


covariance matrix / sigma = (1/m).summation of(l to m)[x . transpose of x] 


^6®fflLun:6OT§j symmetric positive definite ct^uld u60oti_i Glan:60OT(S\6n6rT^n: 
CT6OTU urrijaa SeueocriSiLb. ^uSungj^rreijr ©60)^ eoeu^gj projection-aarreOT 
Gl 6 iiai_ 60 )g ^(^eiirraa (^i^LHii). 


3. svd() ^^6u§j eig() ct^uld function-gu ulu 6 otu(S\^^ projection-aanem: 
Gt6iiai_60)g ^(^eiirraaeunLb. @60)6ii (j^60)jdSlu single value decomposition CT6OTguii, 
eigenvector CTeOTguii ^eo^aauuiSlLb. flleOTeui^LDUgu 


[u,s,v] = svd(sigma) 


3 ^«xfrla60)6TT ^(^eiirraigiLb. u CT6OTU§j^n:6(jr projection -aarrem: ^6®frl. 
^^rreiigj ul, u2, u3...un euewg ©(^a^ii. [BLDaig) SeueoOTi^LU ^6 tt 6 ii 


features-g^ GlffLULueuaii). ul, u2, u3...uk - k ct6otu§j 

CT6ij6ii6TT6ii principle components CT 6 OTU 60 )^a (g)j51a.^JD§j. eunujuun:© 

L]l6OT6II([5LDn:gU- 


principle components = transpose of (u[:, l:k]).x 


4. CT6ij6ii6TT6ii principle components @LpuLl CTgjeiiii) 

CT 6 OTU 60)^4 S 6 II 60 OT(S\li>. @ 60)^4 a 6 OOT( 5 lfll^!- 0 §J'* 

an_gu6iiS^ variance CT6OTgu Giungjeuna 99% variance 

^^5a(g)LDn:gu ung^gja GlaneoOTLn:^ CB^eugj. ct6otS6ii k-6OT Lo^ueoua 
a60CT(S\i!jli^4(g)Lb 6iin:LLJLJun:i_n:6OT§j ilileOTeiii^LDngu ^60)LD.^JDgj. 


Average squared projection error / Total variance in the data >= 0.99 (ct6otS6ii 
99% ^6TT6ii variance-g Glanerr-^JDgj) 


Where, 

Avg. squared projection error = (1/m).summation of(l to m). square of (x - 
projected x) 

Total variance in the data = (1/m).summation of(l to m). square of (x) 


k-6OT LD^ueou ^eiiGleurreAjDrra SiD^aeoOTL eunujuunLi^^ 

CTuSungj LD^uiq 0.99 ^n: 60 (jT( 5 l.^JD§j ct6otu uniruugj ^([5 eueoa. 

u^euna svd()-Li!jld3l([5^§j GlugU'^6OTJD S ^eorileoLU ilileOTeiii^Lb eurrLuuurrLi^^ 
Glun:([5^^ k -6 ot ld^u 60 )u ScBiji^-Lunaa aeoOTlJliljli^aaeuaLb. 
summation of(l to k) S[i,j] / summation of(l to m) S[i,j] >=0.99 


@6ij6iin:jDaa ^6 tt6ii GlaneoOTL ufflLDneocrraaerr ^3 d 6 h&j @LpuLl CTgjeqLb 



[B60)i_GluJDn:LD^ (g)6roJD^^ ^erreiil^ 



Neural Networks 


LD«frl^gU60)i_LU Qy)60)6TT CTeiieufigu ct6otu 60)^ (j^eOTSeOTm^Lurraa Glarreoijr© 

2([56iin:aauuL_i_S^ Neural network (giL^^eo^Lunau Ol/D-sigiLbSungj 

LDslrl^ Qy)60)6TT.S(g) ^6<jrgu®LD O^iJliungj. i!jl6OT6OTij ^^g^errerr ^([5 Qy)60)6n: [Btribi-i 
([glu4,gn:6OT) i_i^lu 6iil6]^LU^60)^.s Glangrrerr^ 0^n:i_[a(g).^JD§j. 

LD^GljDn:([5 CBijibi-i ej^OaeOTSeu OaneoOTiSlgfreTT eiJle^^LU^S^n:© 

@6OTGt6OTn:(_[5 i_i^lu 6iil6]^LU^60)^iL(Lb Glangrr-^jngj. ^eiieunSjD u^Seugu 

CBijiii-iaefr 6 ii60)6uui!j16OT6ot^ eui^eiil^ ^6OTSjDn:Oi_n:6(jrgu imeoeocraaijuL© 
Gt^n:i_ij.?.#lLun:a u^Seugu i_i§jlji_i§j efUe^^Luraaeogrr-s Oa>n:60OTSi_ 6II(_[5.^6 otjd6ot. 

@60)^ ^i^LJU60)i_Lun:a 60)6ii^§j 2^56iin:.saLJUL_i_S0 Neural Network 


^eiiGteiiUi^ 6ijl6)^LU^60)^iL(Lb eueoauu©^^ eueoaijulB^^'S a^-^JDgj. ct6otS6ii 
classification problem-g binary classification -io 

xl, x 2 CT 6 (jrgu ©g 6 OOT(S\features-@^ 5 <s.^JDGl 0 «ffl^, logistic - ^6 ot§j ^^ 60 ) 6 ot 
S cBijiyLurra OaneoOT© h(x) -g a6®frla(g)Lb. ^sjrn:^ neural network- 

^ 6 OT§j raw features-gu ULU 6 OTU(S\^^n:LD^ ^syraGlaeOT ^([5 hidden layer-g 
2 ([ 56 iin:a.^.s OarreDOT©, U6U activation units-g 2 ([ 56 iin:.s.^.s aeo^-s-^JDgj. 

OleOTeui^LDcrgu- 



— fj{^h)% + «i+^4 ^^2 ) 

Where 

"0 = + ()\-I'\+ •'■■•i) 

Likewise fl ^ & Cl<2 




activation unit-6OT LD^uuaeOTgj 0 1 6 ii 60 )g ^ 60 )LD 6 ii^[r^, sigmoid function- 

a(g)6rr ^^gu 60 )i_LU parameters features ^ 60 )ld.^jd§j. © 60 )^ 

activation unit-6OT ld^ui_i a60OTa.^i_uu(S\'^itD§J. ©eiieunSjD ^eiiGleuni^activation 
unit-6OT LD^LJi_ia(^Lb a60OT4.^i_uu(5i'^6OTJD6OT. Parameters-g ^L_i_n: CT6OTgu 
^^6U6iin:, Neural networks-^ © 60 ) 6 ii weights CT6OTgu 
^60)^4aLJU(5l'^6OTJD6OT.CT6OTS6II aeOL^llLina a6®fTlaaUU(S\li h(x) LD^LJI_ia6Tr, 
^^gU 60 )i_LU activation units weights g ©eroeocr^gj sigmoid function-^^ 

a6®fTlaaUU(S\.^l6OTJD6OT. 



Neural Network □□□□□□□ 


a60?MLji_ia(g)^ S^60)6iiLun:6OT features- 6 OT CT60OT6®frla60)a LBlaeiiii 
@^ 5 a(g)ii)Sun:§j logistic-<S(g)LJ u^eurra CBmi neural networks-gu 
UliJ6OTU(5l^^6Umi). 


Binary classification-aarreOT neural network fleOTeui^LDagu ^ewLDiLiLb. 


Layerl Layer2 

(Input) (Activation/hidden) 



= bias unit 
= unit 


Multi-class classification-aaaeOT neural networkflleOTeui^LDagu ^ 60 )LDLL(Lb. 



Layerl Layer2 Layer3 

(Iiipiil) (Aclivalioii/liiddeii) (Oufpul) 



II 

0 

'0 

1 

,11 

'll 

II 

1 


Glu([54ag^a(g) §j 60 ) 60 OTi_ifflLLiLb 6 ii 60 )aLijl^ SffgaauuQii xO, aO 
LD^Lji_ia6rr bias units CT6(jrJD60)^4aLJu(5i'^6(jrJD6OT. 


Input layer: Qy)6u (j^^eurreugj arreocTuuC^ii. 


Output layer: a6®frlaauu(5iii <s6®fTlLJi_ia6rr a 60 )i_.#l ^ 60 )LDLL(Lb. 


Hidden layer / Activation layer - ©60 )i_li!j1^ LD 60 )JD(j^a 

^(5i'*(g)'*6rr an:60OTLJU(S\Lb. LD60)JD(^a Qy)6u ^liffiaaewerT 

S([56iin:4auuL_i_ GlffLU^u(S\^§jii) ^eu^aerr (activation units) an:60OTUu(S\ii. 

LD60)JD(^«S GlffLU^U(S\^§JLb ^6U(g)a6rr 

an: 60 OTLJu(S\ii. 










h(x) □□□□□□□□□□ □□□□□□□ □□□□□ 


-^L^aaeoOTL ^eiiGleurii^ ^6U(g)a(^a(g)LDn:6OT CT60)i_a6rr 

Oftn:(B'S<SLJUL_(S\ 6 rr 6 n: 6 OT. ©6ii^60)jd sigmoid ^eiiGleurii^ 

^6U(g)4(g)LDn:6OT h(x) ld^lji_i aeocra^Luu^'^iDgJ. 

CT60)i_a6fii6OT ld^lji!j160)6otlj Glungu^gJ @60)6iia6fii6OT LD^LJi_i AND, OR, NOT SuneOTJD 
ui^ ^60)LDLL(Lb. 



CT(S\^§jaan:L_(S\.S(g) -30, 20, 20 CTguLb LD^LJi_ia 60 ) 6 n: g(z) Ouni^^^u 

un:ijaft6iiLb. xl, x2 LD^LJi_ia6rr 0,0 CT 6 OT 6 OT 6 ii^ 5 Lb? 0,1 

CT6OT6OT6II([5Lb? 1,0 1,1 LD^ULl<S(^.S(g) CT6OT6OT 611(^11)? Sun:6OTJD60)6II 

a6DCTa.^i_LJu(5i'^|D§J. 




10 



AND: 0,0 CTguibSurrgj g(z) ld^lji_i -30 CT^gmeoJiiLijl^ ^ewLD^JDgj. Sm^aecOTL 
sigmoid euewguL^^^ -30 ct6otu§j 0 ct6otu 60)^4 (g)jSla(g)Lb. ©eiieurrSjD 

LD^uM-serr a60OT4^i_LJu(B'^6(jrJD6OT. ^l_i_6ii60)60ot AND - 

aarreor truth table-g arreocreumi. ^^rreiigj xO ld^fuld xl 1-^a 

^ewLD^^n:^ ldl_(BSld h(x)=l g GleuBilLJuQ^gJii. 









Weights = -30,20,20 

x1 

X2 

h(x) 

AND 

0 

0 

h(x) = -30.X0 + 20.X1 + 20.X2 
= -30 + 20.0 -1 20.0 
= -30 

0 

0 

1 

h(x) = -30.X0 + 20.X1 + 20.X2 
= -30-120.0-120.1 
= -10 

0 

1 

0 

h(x) = -30.X0 -1 20.X1 -1 20.X2 
= -30-120.1 -120.0 
= -10 

0 

1 

1 

h(x) = -30.X0 -120.X1 -120.X2 
= -30-120.1-120.1 

= -30 -140 = 10 

1 


OR: -10, 20, 20 ct^uld LD^LJi_ia 60 ) 6 n: g(z) Ouni^^^u unij-saeiiLb. 

^L_i_ 6 ii 60 ) 600 T OR -aarreOT truth table-g aueDcreumi. 

^^rreiigj xO LD^guii xl 1-^a ^eoiD^^n:^ ldlQSld h(x)=l g OeugffluuiSl^gjLb. 
^^rreiigj xO ^^eugj xl ej^ueugj ^6OTgu 1-^a an_i_ 

h(x)=l g 06II6fflLJU(S\^§JLb. 











Weights = -10,20,20 

x1 

X2 

h(x) 

OR 

0 

0 

h(x) = -lO.xO + 20.X1 + 20.x2 
= -10 + 20.0 + 20.0 
= -10 

0 

0 

1 

h(x) = -lO.xO + 20.X1 + 20.X2 
= -10 + 20.0 + 20.1 
= -10 + 20 = 10 

1 

1 

0 

h(x) = -lO.xO + 20.X1 + 20.X2 
= -10 + 20.1 +20.0 
= -10 +20 = 10 

1 

1 

1 

h(x) = -lO.xO + 20.X1 + 20.X2 
= -10 + 20.1 +20.1 
= -10 + 40 = 30 

1 


NOT : @[j 60 OTi_n: 6 ii§j aerrerr 3-6ii§j ^euarreOTgj NOT xl AND NOT X2 

Qy)6ULb a60CT4.^i_uu(B'^JD§J. ^^rreiigj NOT xl NOT x2 

LD^Lji_lLb AND -Qy)6ULb iTeoM^ii a>6D0T.s.^i_uu(S\.®6OTJD6OT. &r60)i_a6rr 10, 

-20 &T6OTgU ^60)LDlL(Lb. 


Weights = 10,-20 

x1 

X2 

h(x1) 

N0T(X1) 

h(x2) 

N0T(X2) 

NOT X1 AND NOT X2 

0 

0 

h(x) = lO.xO -20.X1 
= 10 

1 

h(x) = lO.xO -20.x2 
= 10 

1 

0 

0 

1 

h(x) = lO.xO -20.Xl 
= 10 

1 

h(x) = 10.xO -20.X2 
= -10 

0 

0 

1 

0 

h(x) = lO.xO -20.xl 
= -10 

0 

h(x) = lO.xO -20.X2 
= 10 

1 

0 

1 

1 

h(x) = lO.xO -20.xl 
= -10 

0 

h(x) = lO.xO -20.X2 
= -10 

0 

1 


CT6OTS6II © 60 ) 6 iia 6 fr ^eOT/Dfia SiD^aecOTL neural network-aanem: ld^lji_i 























ljl6OT6II^5LDn:^ ^eWLDLLlLD. 


x1 

x2 

a1 

a2 

m 

0 

0 

0 

1 

1 

0 

1 

0 

0 

0 

1 

0 

0 

0 

0 

1 

1 

1 

0 

1 












Forward propagation 


Layer 1 : 

a = X 






*0 

1 

< 3 :> 

o 

e, ^ 2 l 

X — 


Layer 2 : 

a = g{B. x) 


.« 2 - 




ao 

B 1 ^0 

Bl ^21 

a =■ 

ai 

Layer 3 : 

h{x) =g{e. 

a) 

.^ 2 . 


(j^^eurreugj agrrerr OffLu^u(B^§Jii ^6uan:6OT§j (activation unit) 

Qy)6u ^LDS^raaerma (raw features)^60)LDLL(Lb. ©gjSeii 26fr6TfL_(S\.san:6OT ^(S\.s(g) 


©ggoOTLrreii^rra aerrengj LD60)JD(j^a ^(S\a(g). aerren: GlffLU^u(S\^§jii) 

^6uan:6OT§j (j^^gurrgii^^ aerrgrr ^liff^jaerr LD^gyii 6 T 60 )i_a 60 )grTLJ(weights) 

Glurrgu^gJ ^eoiDiLiLb. 


aeoL^lLurra aehgrrgj Gl6ii6fiiLiJL_(5i4an:6OT ^(S\4(g) aengrr ^euarrgOTgj 

LD 60 )JD(j^a ^(S\a(g)a6fri^ aengrr ^eu^aerr LD^gyii gT 60 )i_a 60 )grTLJ(weights) 
Glurrgu^gJ ^eoiDiLiLb. 


©gheiinSjD ^giiGlguni^ ^(Sla-^g^LD agherr Os^Lu^u(5i^§jLb ^gU(g)agfiigOT LD^LJi_iLb 
^^gljgOTLLU grgo)i_iL(Lb ^(5ia(g)agfii^ agrrgrr ^gU(g)agfiigOT 

LD^ugou ^giDnsfrlLiuS^ forward propagation gTgOTUu(5iLb. 






Back propagation 


[BLDgj neural network-^ agrrerr ^ 6 U(g)a(g)Lb gT 6 (jr 06 OT 6 OT 6 OT gTeoLagogrru 

UliJ6OTU(5l^^6OTa^, ^6IlgU'S60)6TT.S (g)60)JDaa6Umi) gT6OT.S agOCTiJlflll^LJuS^ back 
propagation ^(giii . ^enGleuai^ (gla(i^Lb ^eugojua a60OT(S\i!jli^.sa 

partial derivative LD^LJi_ia 6 rr i!jl6OT«fii([5^§j 
agooTa.^i_LJu( 5 i'^ 6 OTJDgOT. OlgOTgOTij ^go)giiago)gTT ^gOTgu network-gOT 

cost agoOT(S\i!jli^aauu( 5 i'^tD§J. Ouagjguaa gradient descent algorithm -^gOTgj 
(g)go)JD^^ ^grrgii costGtgiigfiiLJUi_.s SklI^lu gugoaiijl^ neuron-agfrlgOT grgo)i_go)LU 
^goLD-sa back propagation -gu ULUgOTU(S\^§j.^JD§j. 


delta = error of each node in the corresponding layer 


Layer 3 : deltaS = h(x) - y 


Layer 2 : delta 2 = theeta T .deltaS .* a .* 1 -a 


Layer 1 : delta 1 = theeta T .delta 2 .* a .* 1 -a 


where g'(z) = a .* 1-a = This is g-prime. = derivative of the activation function g 


* = element-wise multiplication 


-rf-./ ,,, (Accumulator matrix 1 



Perceptron 


Perceptron neural networks-aanem: ^i^uueoL. ^([5 ScB’rjSaa© Qy)6ULb 

uli^laa 6ii^6u binary classification algorithm ^6OTa^ 

logistic regression SuaeOTgu ^6 ot§j a^jneoeu ^eoLO-sangj. ^([5 (£lu4,ga6OT 
CTeiieuagu Glaa^s^ib Glaa^s^LDna Gi<sa6rr.^ljDS^a ^^60)6ot ^lyuueoLLuaa 

60)6ii^§j, ULijl^#l^ ^geiiaeogrru u^jSIu uiyuuiyLuaaa Oaa6rr.^JD§j. 

-^L^aaeoOTL CT(S\^§jaan:L_iy^ 4 ulijI^^I^ ^geiiaerr Glaa(S\aauuL_(S\6rr6TT6OT. 
xl, x2 CTguLb 2 features-g eoeu^gj 0 1 ct^uld eueoaiijleOT ^601011410 

^geiiaerr ULi!jl^#l.S(g) 


xl, x2,y 
[0.4 ,0.3,1], 
[ 0.6 , 0.8 , 1 ], 
[0.7 ,0.5 ,1], 
[0.9 ,0.2 ,0] 


Neural Networks &t 6 otu§j ScBijiyiuna GlaaghgrmLD^ @ 60 )i_lij 1 ^ u 6 u activation 

units-g 2(_[56iia.s.^ ^^eOTiyuueoLifjl^ Giaagrri^LD CT 6 OTgu ej^OaeOTSeii 

unij^S^mi. ©[a(g)Lb features-giLiii ^^gueoLLU weights-giLiii © 60 ) 6 dot^§j 
S cBijiyLuna hypothesis-g<s GlaaghgrmLD^, ©60 )i_lijI^ activation unit-ga 

ff>6DOTa.^(5l'^|D§J. i5l6OT6OTij ^liLD^uflleOT ^lyuueoLiijl^ ^g6iiff>(^a(g) ej^JDaij 
SuaeOTgu weights-g ldu^jSI <?fflLua 6 OT (J4)60)jdlij 1^ Gi<sa 6 rr.^ljD§j. 

ijl6OT6ii(45LDagu- parameters ct6otuS 0 weights ct 6 ot ^60)i^.saLJU(S\.^JD§j. 


https://gist.github.eom/nithyadurai87/e6794ec008a7855681db4ba9164b54af 




def predict(row, weights): 
activation = weights[0] 
for i in range(len(row)-1): 

activation += weights[i + 1] * row[i] 

return 1.0 if activation > 0.0 else 0.0 

def train_weights(dataset, l_rate, n_epoch): 

weights = [0.0 for i in range(len(dataset[0]))] 
for epoch in range(n_epoch): 
sum_error = 0.0 

for row in dataset: 

error = row[-l] - predict(row, weights) 
sum_error += error**2 

weights[0] = weights[0] + l_rate * error 
for i in range(len(row)-1): 

weights[i + 1] = weights[i + 1] + l_rate * error * 

row[i] 



print{'epoch=%d, error=%.2f 


(epoch, sum_error)) 


• 9 - 

o 


print (weights) 


dataset = [[0.4,0.3,1], 

[ 0 . 6 , 0 . 8 , 1 ], 

[0.7,0.5,11 , 

[0.9,0.2,01 1 


l_rate = 0.1 
n_epoch = 6 


train_weights(dataset, l_rate, n_epoch) 


□□□□□□□□□□□□□□□□□□: 


epoch=0, error=2.00 
epoch=l, error=2.00 
epoch=2, error=2.00 
epoch=3, error=2.00 
epoch=4, error=1.00 
epoch=5, error=0.00 
[0.1, -0.16, 0.06999999999999998] 



Glan:(B'S<suuL_( 5 \ 6 Tr 6 TT features- 6 iii_ 6 OT ©eoeocTaauuL S6 ii60oti^lu weights- 6 OT 
LD^UUaa 0, 0, 0 CT6OTU60)^ 6O)6II0§J 06OTg) a^JD6O)6U0 Gl^n:i_[a(g).^JD§j. 

26 rr 6 TT y,^^LULb, xO ct^uld bias unit-<aaa6OT LD^uua(g)Lb. bias unit CTuSungjii 
1 CTguii) LD^uerouSiLi ct6ot ej^daeOTSeii 

aerreTT y,^^LU[Ba6rr xl, x2 -aanem: weights LD^uun^ii). © 6 ii^ 60 )jd eroeu^gj 
Ljl 6 OT 6 ii([ 5 Lb 6iimLiuuaL_i^6OT Qy)6ULb ^g6iiaaa6OT [0.4 ,0.3 ,1] activation unit 
a60OTa.^i_LJu(B'^tD§J. @§jS6ii heaviside activation function CT6OTgu 
^60)^4aLJu(B'^tD§J. sigmoid SuneOTgu LD^Gljun:!^ eueroa. 


Activation_unit_l = wO.xO + wl.xl + w2.x2 
= 0(1) + 0(0.4) + 0(0.3) 

= 0 


if Activation_unit > 0, Predict 1 
else Predict 0. 


^eheungu aeoOTLjSl^^ ld^lji_i, 0-g eiilL 1 CTeOTeiiii, 

@^ 60 ) 6 uGlLU«frl^ 0 CT6OT6iiLb predict GlffLULLiib. 0 &r6OT predict GlffLULLiib. 
ULul^^l^ 1 ct6ot Glan:(B<saLJUL_(B6rr6TT§j. ©eheungu aerren 

LD^LJi_i, activation unit ld^lji_ii_6ot Sun:a 6 ijl^ 60 ) 6 uGlLU«frl^(l != 0) 

weights-6OT ld^lji!j 160 ) 6 ot ldu^jB ^(B^^ ^g6iia(g) ^efrlaa S6ii6oB(Bii>. 

L]l6OT6II([5Lb 6II[rLLJUU[rL_l^6OT Qy)6ULb l_l^LU Weights a60OTa.^l_LJU(B<^tD§J. 


wO = wO + learning_rate * (actual-predict) * xO 


^6ijGl6iin:([5 weight-ii ^6OTgu60)i_LU uero^LU ld^lji_ii_ 6 ot learning rate-g<a 
an_L_(B<^tD§J. learning rate ct 6 otu§j gradient descent-^ CBmi 
ulu 6 otu(B^§j<^ 6 otjd ld^lji!j160)6ot update-eA ^eneiirreOTgj 



learning rate Qy)6ULb aL_(S\uu(5l^^LJu(S\'^lD§J. ld^ui_i 0.1 ct6ot 

60)6iiaaLJUL_(5l6rr6TT§j. ^^[leiigj L6laff.#ljSlLU ^6TT6ijl^ ©^gU60)i_LU Weights, adjust 
GlffLULUUUL S6II60OT(S\li> CT6OTU60 )^SlU (g)j51a.^JD§J. l!jl6OT6OTij ©a<gn_L_(5i^ 
Gl^n:60)aLL(l_6OT S60OT60)LDLUn:6OT LD^LJI_ia(g)Lb - a6®fflLJI_ia(g)Lb 26rr6TT S6IlgUUaL_l^6OT 

LD^LJi_iLb, weights © 60 ) 60 OTaauuL_(S\ 6 rr 6 TT features-6OT LD^LJi_iLb Glu^ 5 aaLJu(S\.^JD§j. 
i_i^lu weight-6OT ld^lji_i aeocTa-^LUuiSl'^iDgJ. 


6iin:LLJLJun:L_60)i_LJ aeocra-^LLJULL weights-6OT LD^LJi_ia6rr 

Ljl6OT6II([5LDn:gU- 

wO = 0 + 0.1 * 1 * 1 = 0.10 
wl = 0 + 0.1 * 1 * 0.4 = 0.04 
w2 = 0 + 0.1 * 1 * 0.3 = 0.03 


@^^60)aLU i_i^LU weights-gu ulu6otu(S\^^ 2 - 6 ii§j ^g 6 iiaan: 6 OT [0.6 ,0.8 ,1] 
activation unit ilileOTeiii^LDngu aeocTa-^LUuiB'^iDgJ. 

Activation_unit_2 = wO.xO + wl.xl + w2.x2 
= 0.1(1) + 0.04(0.6) + 0.03(0.8) 

= 0.1 + 0.024 + 0.024 
= 0.148 


^[a(g) 0-g 6ijli_ ©(^uu^n:^ 1 ct6ot predict GlffLULLiib. ^geiilg^Lb 

1 CT6OT seherrgj. weights-g LDn^JDaiD^ 3-6ii§j [0.7 ,0.5 ,1] 

activation unit a 60 CTa.^i_uu( 5 i.^ljT)§j. 

Activation_unit_3 = wO.xO + wl.xl + w2.x2 
= 0.1(1) + 0.04(0.7) + 0.03(0.5) 

= 0.1 + 0.028 + 0.015 
= 0.143 


@[a(g)Lb 1 CT6OT predict GlffLU-^JDgj. ^geiilg^Lb 1 ct6ot sehengj. 

weights-g LDU^JDnLD^ 4-6ii§j [0.9 ,0.2 ,0] activation unit 

a60OTa.^l_LJU(5i.^JD§J. 

Activation unit 4 = wO.xO + wl.xl + w2.x2 



= 0.1(1) + 0.04(0.9) + 0.03(0.2) 
= 0.1 + 0.036 + 0.006 
= 0.142 


w0 = 0.1 + 0.1 *-l * 1 = 0.0 
wl = 0.04 + 0.1 * -1 * 0.9 = -0.05 
w2 = 0.03 + 0.1 * -1 * 0.2 = 0.01 


©[5j(g) 1 CT6OT aeorfrla-^ljngj. ^6OTn:^ aecOTewLDiijl^ 0 aerrerrgj. ct6otS6ii 1560(51(5111) 
weights a600T4^i_uu(5l^JD§j. ©oiiounjiirra (^an:(5laauuL_(5l6rr6rT 4 uli!j1^.#1^ 

2 ffijliurra a6®frlaauuL_(5l6rr6rT§j, 2 ^oiiJDrra a6®ffl4auuL_(5l6rr6TT§j. 
©^§ji_6or epoch (^i^.^Ijd§j. ^^rroiigj ^eooor^gju ulljI^.#!^ 

SiD^aecOTL Sffn:^ 60 ) 6 ora(g) 2 l_u( 51 ^^ljul_( 51 , algorithm 
(^aaerroueo^SiLi 1 epoch 6T6or.£l(SjDaLb. flooroui^LDa^. 


EDOCh = 0 

xl 

x2 

y 

weights 

activation units 

predicted_y 

updated weight for next row 

If y != predicted_y 

0.4 

0.3 

1 

0,0,0 

0’1 +0*0.4+0*0.3 
= 0 

0 

0 + 0.1 * 1 = 0.10 

0 + 0.1 * 1 * 0.4 = 0.04 

0 + 0.1 * 1 * 0.3 = 0.03 

0.6 

0.8 

1 

0.1, 0.04, 0.03 

0.1*1 + 0.04*0.6 + 0.03*0.8 
= 0.1 + 0.024 + 0.024 
= 0.148 

1 


0.7 

0.5 

1 

0.1, 0.04, 0.03 

0.1*1 + 0.04*0.7 + 0.03*0.5 
= 0.1 + 0.028 + 0.015 
= 0.143 

1 


0.9 

0.2 

0 

0.1, 0.04, 0.03 

0.1*1 + 0.04*0.9 + 0.03*0.2 
= 0.1 + 0.036 + 0.006 
= 0.142 

1 

0.1 + 0.1 * -1 = 0.0 

0.04 + 0.1 * -1 * 0.9 = -0.05 

0.03 + 0.1 * -1 * 0.2 = 0.01 


epoch-6or aeoL^lujl^ aeoora-^lLUULL LD^LJi_iaS6TT ^( 51 ^^ epoch- 

6or uiijl^^l^ ^g6iia(^i_6or ULU6oru(5l^^uu(5l.^JD§j. ©oiiouajiiaa 6 (j^ 60 )jd 

epochs a600T4.^i_uu(5l.^JD§j. fllooroui^LDagu- 














Epoch = 1 

xl 

x2 

y 

weights 

activation units 

predicted_y 

updated weight for next row 

If V != predicted v 

0.4 

0.3 

1 

0, -0.05, 0.01 

0*1 + -0.05*0.4 + 0.01*0.3 
= 0 + -0.02 0.003 
= -0.017 

0 

0 + 0.1 * 1 = 0.10 

-0.05 + 0.1 * 1 * 0.4 = -0.01 

0.01 + 0.1 * 1 * 0.3 = 0.04 

0.6 

0.8 

1 

0.1, -0.01, 0.04 

0.1*1 + -0.01*0.6 + 0.04*0.8 
= 0.1 + -0.006 + 0.032 
= 0.126 

1 


0.7 

0.5 

1 

0.1, -0.01, 0.04 

0.1*1 -1^ -0.01*0.7 0.04*0.5 
= 0.1 + -0.07 + 0.02 
= 0.1 

1 


0.9 

0.2 

0 

0.1, -0.01, 0.04 

0.1*1 -f -0.01*0.9 -I- 0.04*0.2 
= 0.1 -0.009 + 0.008 

= 0.1 

1 

0.1+ 0.1 *-1 = 0.0 
-0.01 + 0.1 * -1 * 0.9 = -0.1 

0.04 + 0.1 * -1 * 0.2 = 0.02 

Epoch = 2 

xl 

X2 

y 

weights 

activation units 

predicted_y 

updated weight for next row 

If V != predicted v 

0.4 

0.3 

1 

0, -0.1, 0.02 

0*1 + -0.1*0.4 + 0.02*0.3 
= 0 + -0.04 + 0.006 
= -0.03 

0 

0 + 0.1 * 1 = 0.10 

-0.1 + 0.1 * 1 * 0.4 = -0.06 

0.02 + 0.1 * 1 * 0.3 = 0.05 

0.6 

0.8 

1 

0.1, -0.06, 0.05 

0.1*1 + -0.06*0.6 + 0.05*0.8 
= 0.1 + -0.036 + 0.04 
= 0.104 

1 


0.7 

0.5 

1 

0.1, -0.06, 0.05 

0.1*1 + -0.06*0.7 + 0.05*0.5 
= 0.1 + -0.042 + 0.025 
= 0.083 

1 


0.9 

0.2 

0 

0.1, -0.06, 0.05 

0.1*1 + -0.06*0.9 0.05*0.2 
= 0.1 + -0.054 + 0.01 
= 0.056 

1 

0.1 + 0.1 * -1 = 0.0 

-0.06 + 0.1 * -1 * 0.9 = -0.15 

0.05 + 0.1 * -1 * 0.2 = 0.03 



























Eooch = 3 

xl 

X2 

y 

weights 

activation units 

predicted_y 

updated weight for next row 

If V != predicted v 

0.4 

0.3 

1 

0, -0.15, 0.03 

0*1 + -0.15*0.4 + 0.03*0.3 
= 0 -H -0.06 -t 0.009 
= -0.051 

0 

0 + 0.1 * 1 = 0.10 

-0.15 -r 0.1 * 1 * 0.4 = -0.11 

0.03 -r 0.1 * 1 * 0.3 = 0.06 

0.6 

0.8 

1 

0.1, -0.11, 0.06 

0.1*1 + -0.11*0.6 + 0.06*0.8 
= 0.1 -0.066 + 0.048 

= 0.082 

1 


0.7 

0.5 

1 

0.1, -0.11, 0.06 

0.1*1 -I- -0.11*0.7 + 0.06*0.5 
= 0.1 -I- -0.077 + 0.03 
= 0.053 

1 


0.9 

0.2 

0 

0.1, -0.11, 0.06 

0.1*1 + -0.11*0.9 + 0.06*0.2 
= 0.1 -I- -0.099 + 0.012 
= 0.013 

1 

0.1 + 0.1 * -1 = 0.0 
-0.11 + 0.1 * -1 * 0.9 = -0.2 

0.06 -r 0.1 * -1 * 0.2 = 0.04 

Eooch = 4 

xl 

X2 

y 

weights 

activation units 

predicted_y 

updated weight for next row 

If V != predicted v 

0.4 

0.3 

1 

0, -0.2, 0.04 

0*1 -0.2*0.4 + 0.04*0.3 
= 0 + -0.08 + 0.012 
= -0.068 

0 

0 + 0.1 * 1 = 0.10 

-0.2 + 0.1 * 1 * 0.4 = -0.16 

0.04 -r 0.1 * 1 * 0.3 = 0.07 

0.6 

0.8 

1 

0.1, -0.16, 0.07 

0.1*1 -t -0.16*0.6 + 0.07*0.8 
= 0.1 -I- -0.096 + 0.056 
= 0.06 

1 


0.7 

0.5 

1 

0.1, -0.16, 0.07 

0.1*1 -I- -0.16*0.7 + 0.07*0.5 
= 0.1 -t -0.112 + 0.035 
= 0.023 

1 


0.9 

0.2 

0 

0.1, -0.16, 0.07 

0.1*1 -0.16*0.9 + 0.07*0.2 

= 0.1 -f -0.144 + 0.014 
= -0.03 

0 



Enoch = 5 

xl 

X2 

y 

weights 

activation units 

predicted_y 

updated weight for next row 

If V predicted v 

0.4 

0.3 

1 

0.1, -0.16, 0.07 

0.1*1 + -0.16*0.4 -r 0.07*0.3 
= 0.1 + -0.064 + 0.021 
= 0.057 

1 


0.6 

0.8 

1 

0.1, -0.16, 0.07 

0.1*1 + -0.16*0.6 -r 0.07*0.8 
= 0.1 + -0.096 + 0.056 
= 0.06 

1 


0.7 

0.5 

1 

0.1, -0.16, 0.07 

0.1*1 + -0.16*0.7 -r 0.07*0.5 
= 0.1 + -0.112 + 0.035 
= 0.023 

1 


0.9 

0.2 

0 

0.1, -0.16, 0.07 

0.1*1 -r -0.16*0.9 -r 0.07*0.2 
= 0.1 + -0.144 + 0.014 
= -0.03 

0 



6 - 6 ii§j epoch-^ ^rreOT, ^60)6 ot^§ju uujI^^I^ s^ffliurra 

a6®M.sauu(S\.®6OTJD6OT. &T6OTS6II ^^gu60)i_LU weights-gSiLi Ql^aneu^ ^geiiaeoerr 
a6®MLJu^^an:6OT algorithm-6OT weights-^<s CBaii) CT(S\^§ja GlaagrTgrreuaii. 
@6ij6iiaJDaa ^geiJleOT a6DrfflLJi_i ffffliLiaa 







































©^ 60 ) 6 uGlLU«frl^ weights-g LS’eocrQii «a 60 OTa.^L_( 5 \ 
a6orfflLJi_ia6rr ^eroeOT^gjii ffffliLirra (gla(i^Lb 6ii60)[j (^60 )jd 

L]l6OTU^JDUu(B6ii^n:^, error-driven learning algorithm CT6OTgu 
^6ro^aaLJu(B'^iD§J. @^6OTi^uu6roi_ujl^ ^ 60 )LD.^ 6 OTJD MLP (Multiple Linear 
Perceptron) ct 6 otuS 0 neural networks-g 



Artificial Neural Networks 


^(i5 [61u4,gn:6OT Oarrerreiieo)^ ^lyuueoLLuna eoeu^gj Oarrefreiigj 

perceptron CT6OTJDn:^, u^Seugu (£lu 4 ,gn: 6 OTff> 60 ) 6 n:.s OarreoOTL Qy) 60 ) 6 n: 

Oarrerreueo)^ ^i^uu 60 )i_Lun:a eoeu^gj Glarrerreugj Multi-layer perceptron 

^^rreiigj Gls^Lu^aeogrr ^i^LiueoLLuna-s OarreDOT© (£lu4,gn:6OTa6rr 
ff>^.^6(jrJD6OT. [glu4,gn:6OTa6rr OaneDOTLeo)^ eoeu^gj iDsfrl^ Qy)60)6TT a^-^ljDgj. 

(j^ 60 )jdlij 1 ^ ^geiiaeogrr ^i^LiueoLLuna-s OarreDOT© perceptron a^.^6OTJD6OT. 
Perceptron-a 60 ) 6 TT eoeu^gj directed acyclic graph-g ai^eun-s.^ MLP a^-^JDgj. 
©gjSeu Artificial neural network CTeAgu ^ 60 )i^.saLJU(S\.^JD§j. 


Perceptron &t 6 otu§j ScB’rjSan:© Qy) 6 ULb iljlffl-sa-ss^i^LU ^geiiaeogrr eueoauu©^^ 
a^eiiLD &T 6 OTgu 6 j^Gla 6 OTS 6 ii unij^S^aLD. Non-linear (j^ 60 )jdlij 1 ^ ^eoLD^gjerrerr 
^geiiaeoenu i!jlfflLJU^^(g) MLP-gu ULueOTuQ^^eumi. CTeOTSeu^aeA universal 
function approximator CTeAau ^60)^.saLJU(S\.^JD§j. ^efilg kernalization &t 6 otjd 
^^§ j 6 ii(j^Lb non-linear (j^60)jdlij 1^ ^eoLD^gjerrerr ^geiiaeogau i!jlfflLJU^^(g) a^eiiii. 
@60)^LJU^jSI SVM CT6OTJD U(g)^Lljl^ 6J^Gla6OTS6II Uafj^gj 6iilL_Sl_aLb. 


-^L^aaeoOTL CT(S\^§J'*'*aL_i^^, 16 ^geiiaea Glaa(5l'*'*uuL_(5l6a6a6OT. X -&j 

^^gU60)LLU 2 features-ii, y-^ ^eoeii euewaiijleOT flijlaa S 6 ii 60 OT(S\ii CTguLD 
6iil6iig(j^Lb ULijl^^l4(g) ^6fflaauuL_(S\6a6a6OT. 1,2,3 ct^uld Qy)6OTau eueoaaefrleOT 
^geiiaea fllfflaauuiSleii^a^ multi-class classification-aaasa s^ageocTii 


https://gist.github.com/nithyadurai87/b95e0ccd56464646da32ffdddb8b457f 




from mlxtend.classifier import MultiLayerPerceptron as MLP 
from mlxtend.plotting import plot_decision_regions 
import matplotlib.pyplot as pit 
import numpy as np 

X = np.asarray{[[6.1,1.4],[7.7,2.3],[6.3,2.4],[6.4,1.8],[6.2,1.8], 

[6.9,2.1] , 

[6.7,2.4],[6.9,2.3],[5.8,1.9],[6.8,2.3],[6.7,2.5],[6.7,2.3], 

[6.3,1.9], [6.5,2.1 ], [6.2,2.3], [5.9,1.8]] ) 

X = (X - X.mean{axis=0)) / X.std{axis=0) 

y = np.asarray{[0,2,2,1,2,2,2,2,2,2,2,2,2,2,2,2]) 

nn = MLP{hidden_layers=[50],12=0.00,11=0.0,epochs=150,eta=0.05, 
momentum=0.1,decrease_const=0.0,minibatches=l,random_seed=l,print_progi 
nn = nn.fit (X, y) 

fig = plot_decision_regions{X=X, y=y, clf=nn, legend=2) 


pit.show{) 



print{'Accuracy(epochs = 150): %.2f%%' % (100 * nn.score{X, y))) 

nn.epochs = 250 
nn = nn.fit (X, y) 

fig = plot_decision_regions{X=X, y=y, clf=nn, legend=2) 
pit.title{'epochs = 250') 
pit.show{) 

print{'Accuracy(epochs = 250): %.2f%%' % (100 * nn.score(X, y))) 

pit.plot(range(len(nn.cost_)), nn.cost_) 

pit.title('Gradient Descent training (minibatches=l)') 

pit.xlabel('Epochs') 

pit.ylabel('Cost') 

pit.show() 

nn.minibatches = len(y) 
nn = nn.fit (X, y) 


pit.plot (range (len (nn.cost )), nn.cost ) 



pit.title {'Stochastic Gradient Descent {minibatches=no. of training 
examples)') 


pit.ylabel{'Cost') 


pit.xlabel{'Epochs') 


pit.show{) 


^[j6iiff>60)6n:4 GlarreoOT© MLP-a@u ulij1^#1 ^6 ffl.S(g)LbSun:§j, 
ijl6OT6ii([5LDn:gu ^[J6iia60)6n:u Olffla-^ljDgj. 1 CTgutb eueoaujleOT fllfflaauuL 

SeueoOTi^iLigj ffffliLin:® ^eoLDLuniD^, 0 CTgutb eueoaujleOT.^^ 

uliJlaauuLi^i^LJueOT^a arreDoreumi. 




ct6otS6ii MLP-a(g)LJ Gla[r(S\aauuL_(B6rr6rT epochs-6OT 

CT60(jT6orfrl460)a60)LU 150-djl([5^§j 250-CT6OT LDfT^jSl urrijaaeiiLb. 

©uSungj ^60)6OT^§j^ ffffliLirra 6ii60)auu(S\^^uuL_i9.([5uu60)^4 

arreocreumi. 


nn.epochs = 250 
nn = nn.fit(X, y) 

fig = plot_decision_regions(X=X, y=y, clf=nn, legend=2) 

plt.title('epochs = 250') 

plt.showO 


epochs = 250 
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CT6OT(S6ii^n:6OT 150 CTgULDSurrgj accuracy 93.75% CTsyreiiLD, 250 CTguiiSuagj 
accuracy 100.00% CTeOTeiiii aaeocTeuaii). 



Iteration: 150/150 | Cost 0.06 | Elapsed: 0:00:00 | ETA: 0:00:00Accuracy(epochs 
= 150): 93.75% 

Iteration: 250/250 I Cost 0.04 I Elapsed: 0:00:00 I ETA: 0:00:00Accuracy(epochs 
= 250): 100.00% 


^eiiGleurii^ epoch-g^ii cost ld^lji_i CTeiieungu (g) 60 )JD.^JD§j 

CT6OTU§j 6ii60)rrui_LDn:a eueorr^gj an:L_i_uu(5i'^tD§J. MLP-a(g)U ulij1^#1 
^6fiift(g)ii)Sun:§j Glan:(S\aauuL_(5i6rr6TT parameter-agfii^ minibatches -eA 

LD^Lji_l 1 CT6OT ^ 60 )LD^^n:^ gradient descent (j^ 60 )jdlijI^ ^rr6iia(^.S(g)LJ 
@ 60 )^LJ u^jSl^^neA perceptron-^ 


Gradient Descent training (minibatches=l) 









^gjSeu minibatches - 6 ot ld^lji_i Glaa(S\<aaLJUL_(S\6rr6TT 
CT60(jT6orfri460)aLuaa ^ 60 )LD^^a^, stochastic gradient descent (J^60)jdlij 1^ 
^g6iia(^a(g)Lj uluI^^I ^6 fria(g)Lb. ^eiiGleuai^ ^geiiaerraa 

gradient (J^60)jdllj1^ cost ld^u60)u (g) 60 )JD^§j eugaiD^, GlLoa^^maa ^60)6ot^§jlj 
uLul^^l^ ^geiiaewOTiLiLb GiaaeoOT© (g) 60 )JD^^ cost-g<s 

a^ 6 ii 6 iiS 0 stochastic gradient descent CTsyrSeu^aeOT .^i^aaeoOTL 

6 ii 60 )gLJUi_^^^, ^eiiGleuai^ epoch-g^ii ^^gu 60 )i_LU cost ld^ui_i ^ 60 ) 6 ot^§jlj 
uLul^^l^ ^geiiaewOTiLiLb aeocra-^lLuuQeii^a^, ^ 60 ) 6 ii zig-zag eui^eiil^ 

^ 60 )LD^^([ 5 uu 60 )^a aaeocTeuarb. iBla ^^rreii CTeocreorfriaeOTaLi!]!^ uli!j 1 ^.# 1 ^ 

^geiiaerr ©(^a^iiSuagj, gradient descent ^ 60 ) 6 ii ^ 60 ) 6 OT^ 60 )^LL(ii ^eiiGleuaeOTJuaa 
^gaiLi^gj global optimum Glff 6 OTJD 60 )i_LU L6l(g)^^ ScBijii flli^-a^Lb. ^ewaiLia^, 
SuaeOTJD ^(^eDCTKjaefil^ stochastic-gu ULU6OTu(5l^^6uaLb. ©gjSeu batch gradient 
descent CTeOTguib ^eo^aauuiSi'^llDgJ. 


Stochastic Gradient Descent (minibatches=no. of training examples) 






□□□□□□□□ 


@^§ji_6OT ©Lu^^g6iii^.s (j^i^ 6 ii 60 )i_^§j 6 iili_ 6 iil^. 60 ) 6 u. Deep Learning, AI, 
Neural Networks &T6OTgu u^Seugu i_i§j 60 )LDff> 6 rr aeorfrlsfrl IBL^gJ 6II(_[5.^16 otjd6ot. 

^ 6 ii^ 60 )JD © 60 ) 6 ®JTLU^^^ .^ 60 )i_.S(g)Lb unL^aerr, eueoeuLiu^eiiagrr, StackOverFlow 
SuneOTJD ^enaagrr, anGleocrngfrlagfr eu^SiLi 0^n:i_ij^§j 6iig S6ii6DOT(5l'^SiD6OT. 



□□□□□□□□□□ 


uleOT 6 ii(_[ 5 Lb YouTube Playlist io &t6ot§j an:Gl 6 DCTn: 6 fila 60 ) 6 TT.s aaeooreuaLb 


https://www.youtube.com/watch? 

v=iHG8We58HVY&hst=PL5itdT07Pm8wxRaPWljPntnBmnOs4ExDM 


Ml-01 introduction to machine learning in tamil - - ^([5 

^jSl(j^aLb https://www.youtube.com/watch?v=iHG8We58HVY 


ML-02 Introduction to Machine Learning Algorithms in Tamil 
https://www. youtube.com/watch?v=AYMuT05i4gE 


ML-03 Pandas - ^^5 ^jSl(^aLb - Introduction to Pandas in Tamil 
https://www. youtube.com/watch?v=ljrK84iZv7g 


ML-04 Machine Learning Model Greation in Tamil 
https://www. youtube.com/watch?v=Nz6iJOZli-k 


ML-05 Machine Learning Model - Prediction - in Tamil 
https://www. youtube.com/watch?v=05HMDKepzRc 


ML-06 Leature Selection - Manipulated variable - Disturbance Variable 
https://www.youtube.com/watch?v=H85tTH HLMw 











ML-07 Feature Selection - Process Variable - RFE Technique - In Tamil 
https://www. youtube.com/watch?v=DyqlK24vlso 


ML 08 - Machine Learning in Tamil - 08 - Improving Model Score 
https://www. youtube.com/watch?v=6clvCfFI6qI 


ML 09 - Machine Learning in Tamil - Outliers Removal 
https://www. youtube.com/watch?v=SfBNynpsoyO 


ML 10 - Machine Learning in Tamil - Explanatory data Analysis 
https://www. youtube.com/watch?v=SliSuYJ-xiU 


ML 11 - Machine Learning in Tamil - Simple Linear Regression 
https://www. youtube.com/watch?v=OB36E9yvlPI 


ML 12 - Machine Learning in Tamil - Gradient Descent 
https://www. youtube.com/watch?v= 3Cfw2gmOhI 


ML 13 - Machine Learning in Tamil - Multiple Linear Regression 
https://www. youtube.com/watch?v=ECK4bjIrWjw 


ML 14 - Machine Learning in Tamil - Polynomial Regression 
https://www. youtube.com/watch?v=8dJML0Xvzro 










ML 15 - Machine Learning in Tamil - Feature extraction using vectors 
https://www. youtube.com/watch?v=-Xktzn9XxGg 


ML 16 - Machine Learning in Tamil - Natual Language ToolKit 
https://www. youtube.com/watch?v=yZLG5hOIvPM 


ML 17 - Machine Learning in Tamil - Logistic Regression 
https://www. youtube.com/watch?v=dXEnjS7Xjqs 


ML 18 - Machine Learning in Tamil - Multi class classification 
https://www.youtube.com/watch?v=R IXGhOlEoA 


ME 19 - Machine Learning in Tamil - Neural Networks 
https://www. youtube.com/watch?v=8pOBrF7bfqs 







□□□□□□□□□□ □□□ □□□□□□□□□□□ 
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aLL^JD ft6®frlgJL_U^^6OT CTgffllLI 6S[6)^[LI[^&m ^^gJL_ULDn:6OT ^liff^agrT 

6ii60)g 6iil60)L^LL(Lb &T6ii^54(g)Lb S^60)6iiLun:6OT ^aeu^aeogrr Gl^n:i_ij.?^lLun:a^ 

^(^li ^grrLDmLi ^(^^Glugueiigj. 


. 260)g, ^effl CT6OT u^g^i_a eueoaftefflg^Lb 6ijl6iig[aa60)6TT ^(^eugj. 


©^§J60)JDLLjl6OT [glaL^6Iia60)6TT CT(S\^§J60)gLJU§J. 


. CT6ii([5Lb uraa^fflaa ejgjeumLi Lurreiii^a^LDrreOT G1 [bjS1li!j 1^ 6iil6ii[j[5ja60)6rT 
6Ul^[5J(^6U^. 


. eui^eiJlg^Lb, L4^^a[aa6n:n:a6iiLb, 6iiL_(S\.sa6TTn:a6iiLb eiSleugaaeoeTT 

Gl6II6fflLLjl(B6II§J. 


. 6iil([5LJU(j^6rr6rT CTeui^ii uraaefflaaeurrLb. 


. aLL^JD aeorfrlgjLULb eiile^^LULDcra S6ii60OT(S\Lb. 



Gl^n:i_[a(g)Lb a6®fflLU^^^(g) 

u^ui_lfflLD^60)^ ^efflaa CT^ijun:iraauu(S\'^|Srj'*6rr. 


editor@kaniyam.com (y)<s6uffl<s@ .^[^axseoOTL 6ljl6U[j[Bi<s6tTi_[Bi<£luj LDLOeuneOTeron) 2 p^ 0 LD[T^uj[nij 
.^effl^gjefilL® uj[T0ii u[Bi<s6ffla)<s^ O^nLEiaeunii. 


o □□□□□□□□□□□□□: u^uqfflLDii 


o □□□□ □□□□□□□□□□ 


■ CT6OT6OTn:^ ^guuuuuQii u60)i_LJi_ia6rr 

(j^^6OT(j^^6U[rLLJ u60)i_aaLJUL_i_^n:a 
agu^uj6ffla.^lSjD6OT. 


©^6OTGlun:([5L_(B U^LJI_1|51 lD0^6O)6OT 

a6®fTlLU^^^(g) 6II^[a(g).^SjD6OT. 


■ S[aa(^l_60)LU (J^(l^uGlULuij, 


^[iraaerr uraaefflaa 6 iil([ 5 Lbi_iLb ^([5 U(g)^Li!jl^ SeuGljDrri^euir ej^aeOTSeu 
uraaeffl^gj 6ii([5.^JDn:ij CT«frl6OT ^6II([5 i_6ot ©eroeocr^gj u6®fflLun:^JD (^60)6OTLU6iiLb. 


aL_(S\60)[j<s6rr GlLDn:^GluLuijLJi_ia6TTn:a6iiLb, eiile^^LULDjSl^^ ^(^euij Glffcr^eua 
SaL© ©LU^JDUUL_i_ 60 ) 6 iiLun:a 6 iiLb ©(^aaeurrii. 



. u60)i_LJi_ia6rr Gl^rrLgaerrrraeiiLb ©(^aaeurrii. 

. gjLuii, Glarrerreroa efSlerraaLb, flIgffffrrgLb, aero^, 

erocBLurreoOTi^ ct6otlj ueuaeoeuaefflg^Lb ©0§j6O)JDa(g) Glurri^^gjiiui^LurreOT 
©(^aaeurrii. 

. ^[aa(^a(g) @Lu^un:6OT CT^^deun:!^ [B60)i_Li!jlg^Lb CT(i^^6un:Lb. 

• ^[aaerrgj u 60 )i_LJi_ia 60 ) 6 rT CTeffliuGl^n:!^ aeog ^eueocTLorra 
editor@kaniyam.com (j^aeiiiJla^^guyflleoeiiaaeiiLb. 

. 06TT ugm_DfflLJi_i, aerrefrlLL ejeroeOTLU eiil^raaefiig^Lb 

UKjaefiiaaeumi). 


• gLUKjaefrli^uflleOT editor@kaniyam.com LDi_d}lLU^JD6iiLb. 


. Gl^n^^gjLu^eo)^ ^jSIlu eiilero^LLiLb 

SLD^Glan:6rr6TTLJu(5iii> (^LU^#lLun:(g)Lb ©§j. 

. uraaefilaa ^[iraaerr ^^gjLu eiinLu^^eiirma 

S6II60(jT(5ili CT6OTJD aL_l_n:LUL6l^60)6U. 



• 6 iil 6 )^iij^ 60 )^ ©LU 6 (jrJD CTeffliLi (j^ 60 )jdlij 1 ^ CT(S\^§j 60 )gaa 

^rj6iiLb Sungjii. 


. eugrrijff^l cbld ^eiiGleurii^eiiiJleOT eoaujlg^SiD aerrerrgj. 


. (g)60)JDa6ffldjl^5LJijl6OT (^eOTJDLurra Gl^fflujuu©^^ (^6OT(S6OT^JD^^^(g) 6ii^ 
6ii(g)aa6iiLb. 


u^Lji_lfflLDLb © 2019 aeD^LUii. 
aeorffliLi^^^ Gl6ii6fflLijli_uu(S\Lb aL_(S\60)ga6rr 

http://creativecommons.org/licenses/by-sa/3,0/ aerr^rr ^iJlSiui^eii 

an:LD6OT6nj Gl[BJSla60)6TTGlLun:^§j 6ii^[aaLJU(S\^6OTJD6OT. 


©^6OTUl^, 

Gl6ii6frl6ii(_[5Lb aL_(S\60)gff>60)6n: a6®frlLU^^^(g)Lb U60)i_^^ 
CT(i^^^n:6TT(_[5a(g)Lb affliu OT6OTJD6frl^§j, [B<sGl6u(5t'S<s, 6ijl[glSLun:.^4a, U60 )jdot^jd, 
6j^JDui^ ^60)LD^§j.s Oarrerrerr, Scbct-s.^^ ulu6otu(S\^^ 

6Ill^[aaLJU(S\.^JD§J. 


^-^IffliLiir: #«frl6iin:ff6OT - editor@kaniyam.com +91 98417 95468 


a)L_®6ro[jffi6ffl^ 06U6fflLJU®^^uu(5lii a)L_®6ro[j[]jn:<flifluj0.5(oa) aifliLisj: 
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[^aaporiuio 

fD <i ® L <8»(SfF 


rect224 



□ □□□□ □□□□□□□ - Vision 


^l61l^ Glmn:^ @6OT4(g)(i^aa6rr GlLDLLJ(gl<aij6ii6rT[aa6rr, ai^eiilaerr 

LD^gUii ^jSl6ii^Gl^n:(g)^a6rr, ^ 60 ) 6 OT 6 ii([ 5 a(g)Lb aL_i_^JD ^ 60 )i_a(g)Lb 


□ □□ □□□□□□ - Mission 


^jSleiilLu^ LD^FUii) ffQy)aLJ Glurri^OTn^rTg 6ii6TTrr.?^la(g) ^uu, ^l61l^ GlLDrr^iijleOT 
uLU6OTun:(S\ 6H6mj6n&j)^ 2.gu^uu(B^§J6ii§jLb, ^60)6 ot^§j Gl^rr^^ai^ii), 

6II6rT[5Ja(^Lb aLL^JD ^60)6OT6II([54(g)Lb -^leOLaaffClffLU^g^LD. 
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. a6®fflLULb iBleOTsfrl^L^ - kaniyam.com 


• -^IfflSiLiLiyeij am_D6OTa ai^leoLDiijl^ ©eueuff ^l61i^ iBleOTgur^aerr 
FreeTamilEbooks.com 




□□□□□□□□□□□□□□□□□□□□□ 


. a60)[T Lorrnijf^ - Text to Speech 


• CT(ij^gi60OTffl - Optical Character Recognition 


• 6ijliSi^py)6l)igS^I<95i93lT6OT CT(jji9S5il60OTfrl 


. LSteOTgirr^aerr a 5 ([fi 6 ijl 4 (gi ^giiung)^ - Send 2 Kindle 


• 6iila.^LJ[3‘[aLijn:6ijlnian:6OT a([fi6iila6Tr 


• l616OT^IT61}i9Ei 6TT 2.(^61111iS(g^l-D iS(^6ijl 


• aeon Lorrnijiil - jBeoeognu Ogmdjl 


• g[aa5 j§}6ua.^LULb - ^6OTLarTn:LLj(h\ Glgmd]! 


• FreeTamilEbooks - ^eOTianmuQ Qe^mdjl 











FreeTamilEbooks - Qgmd}! 


WikisourceEbooksReport j§) rR.^LU GlLDn:^a(^.saan:6OT 6iila.^Qy)6ULb 
lEleOTgUT^aerr u^eiilJDaaLJ uli^lu^ 


FreeTamilEbooks.com - Download counter iFleOTgiiT^agrr u^eijl/D-sau 

ULl^LU^ 
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. aerrerr LBleOTgiir^aeogrr ueorfrliLingrrijagrr 

Qy)6ULb 6iil60)g^§j 


. ScBij (glgeueog ueorffliuLDij^^ u^Seugu aL_i_^JD OLDeOTOuni^Lagrr 
S(^6urriSi(^^6i) 


. ^l61i^ NLP aan:6OT uujI^-^Iu ULLeoJiiagfr 


. aeo^MiLiii) euriffaij euLLii 2(_[56iin:.S(g)^^ 


. aLL^JD GlLD6OTGlun:([5L_a6rr, -^IfflSiULi^eii arrmeOTa afflewLDiijl^ 6ii6rT[5ja60)6rT 
2([56iin:a(a)U6iiija60)6TTa aeoOTLjSl^gj aMEa^eiJl^^^ 


• a6®fflLULb ^^a u[5ja6ffluun:6rTija60)6rT ^(^eiirraig)^^, 


. iBleOTguTeurraa^gjaig) ^([5 ©60)60OTLU^6rT Glaiud)! 


. CT(l^^§J60OTffla(a) ^([5 ©«5)6DCTLU^6TT daLUdJl 


^l 61 i^ ^dJlSLuneoLagrr Gleueffliijl©^^ 


OpenStreetMap.org ^ agfrerr ©lld, aMiij GluLuiraeoerr ^LBl^naaLD 

Ofl’lLl^gl} 


^iSli^cBn:© (j^(i^6ii60)^iL(Lb OpenStreetMap.org io 


@^^60)^4 ago^agogrr gui^eijl^ 6ii^[5J(g)^^ 


Ta.wiktionary.org g ^(i^[a(g)U( 5 i^^ API a(g) S^rr^rra 


Ta.wiktionary.org aaaa ^djluu^eii OffiuiLiLb OffLud)! a([56iiaa(g)^^ 


^l61l^ gT(i^^§jLJ Ljl60)^^^([5^^ a([56iiaa(g)^^ 


^l61l^ SeuijffGlffa^ aa^ii 2([56iiaa(g)^^ 


CT^eua FreeTamilEbooks.com L6l6OTguT^ff>60)6mL(Lb Google Play Books, 
GoodReads.com &j ej^au^^ 








. ^l61i^ ©60)60otlu GlffiudJl 2([56iin:4(g)^^ 


. ^l61l^ CT(i^06iiLb ui^-saeiiii a^JD ©60)60 ctlu Oa^LudJl 2([56iin:4(g)^^ ( 
aamozish.com/Course preface Suaeu) 


SiD^aeDOTL ^LLcaaerr, OLDeOTOuai^Laeoerr ai^eua-s.^ araagrr 

^60)6OT6iiffl6OT ^^geiiii S^60)6ii. CTeiieuaS/Dguib U[aa6frl.sff> ©Lug^ii CTsfrl^ 

araagrr 6 iil 6 iig[aa 60 ) 6 TT kaniyamfoundation@gmail.com 4(g) L6l6OT6OT(gff4 
^gliuncaaerr. 




□□□□□□□□□□□□□□□□ 


a6®fflLULb ^JDaaL_i_60)6rTLLjl6OT GlffLu^aeTT, ^LLcaaerr, GlLDeOTGlurri^Laerr LurreiiLD 
^ 60 ) 6 OT 6 ii([ 54 (g)Lb GlurrgjeiirreOT^rraeiiLb, 100 % Gl 6 ii 6 frlLJU 60 )i_^^ 6 OT 60 )LDiL(i_gULb 

iSlrRgi iSlegieoOTLJi!]!^ GlffLU^aewOTiLiLb, iSleroeogruOl^ mn:^ ^jSlaeoa, 
6 iig 6 ii Glff 6 U 6 ii 6 iil 6 iig[aa(^i_gULb arreocreuriLb. 


aeo^MiLiii) ^JDftaL_i_60)6mijl^ 2^56iin:.sauu(S\Lb OLD6(jrOun:(_[5L_a6rr LuneiiLb aLL^JD 
OLD6OTOun:^5L_a6TTn:a Qy) 6 u (£lgg^i_ 6 OT, GNU GPL, Apache, BSD, MIT, Mozilla 
affleoLoagfrl^ ^eA/naa 06 ii 6 fflLijli_LJU(S\Lb. ai^eua-sauuiSlLb QIjd eugrrraagfr, 
i_160)a5LJUi_[aa6rr, ^d3laSaaLJi_ia6rr, aaGleocraefflaerr, iSleAgiiT^agrr, aL_(B60)ga6rr 
iLiaeiiLD Luaeui^LD u-^i^ld, ULU6(jru(S\^§jii eueoaujl^ -^fflSiuLi^eii aaiDeAsi affleoLDiijl^ 




□□□□□□□□ 


araaerr CBeOTGlarreoLaerr ^LBl(i^aan:6OT 6ii6rT[5ja60)6rT ai^eurra^LD 

GlffLU^a60)6TT 6II60)aLLjl^ 6ijl60)g^§J Off LULU 2ME4(g)6iil<a(g)li). 


ijl6OT6ii([5ii) aeoOT-s.^^ SLaagfr LBeOTGlaLreoLaeogrr ^guuQl, 2 lS6ot 

6iil6iig[aa60)6TT kaniyamfoundation@gmail.com a(g) L6l6OT6OT@ff^ ^giiuLiLaaerr. 


Kaniyam Foundation 

Account Number: 606 1010 100 502 79 


Union Bank Of India 
West Tambaram, Chennai 


IFSC-UBIN0560618 


Account Type : Current Account 




UPl □□□□□□□□□□□□□ QR Code 


(g)j 5 lLji_i: .#l6u UPI QR Code Seueoeu Gls^LULuriLD^ Sunaeumi. 

^ffs^LDLULD SmSeu agrrerr aeocra^ grecOT, IFSC code g ULUg(jru(S\^^6iiLb. 


Note: Sometimes UPI does not work properly, in that case kindly use 
Account number and IFSC code for internet banking. 


zj™ 


BHIM UPI Payments Accepted at 
Kaniyam Foundation 



Account Number: 606101010050279, IFSC Code: UBIN0560618 


Scan and Pay using any UPI supported Apps 








(p 

iMobile 



RBL Pay 

(ff) 

Union UPl 


BHIM 



HDFC 


o 

PhonePe 


■ 

uF^m 


PNB UPl 



Baroda Pay 


pastm 

PayTM 


Yes Pay 



Indus Pay 


o 

SBI 

SBI Pay 


A 


AXIS Pay 

BOI'A’ 

BOI UPl 



Maha UPl 


— 
Generated By: https://nsisodiya.github.ioAJntversal-UPl-QR-Code-Generator 










